Project

General

Profile

Actions

Bug #36859

closed

A re-sync should always recover from a previous syncs failed publication

Added by Quirin Pamp 11 months ago. Updated 7 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Repositories
Target version:
Difficulty:
Triaged:
Yes
Found in Releases:

Description

When syncing a repo, it is possible for the Pulp sync task to succeed, but the Pulp publication task to fail.
We have seen this happen in the wild for deb/APT repositories where the OOM-Killer simply killed the publication task, but it could happen to any content type for any number of reasons.

When a Pulp publication task fails (for whatever reason), Katello will end up in the following broken state:

The repository in question will show the latest content from the successful sync, however, it will still be serving the published repository from the last sync before that.
If this happens on an initial sync, Katello will be serving an empty repo, while happily reporting the presence of thousands of packages in the UI.
This erroneous state will also propagate to any CVs, CCVs, and LCENVs that the affected repo is a part of.

My expectation would be that simply re-syncing the affected repo should heal the broken state (so long as the sync task turns green this time):

However, this is only the case, if the re-sync finds some new content in the upstream repo. If not, Katello will conclude there is no reason for a re-publish, leaving the wrong publication in place.
Once fully understood, the broken state can be healed by forcing a re-publish using: `hammer repository republish --id <your_repository_id> --force true`
However, it requires expert Pulp and Katello knowledge to analyze what is going on and put this together with the Hammer command.

It should be pretty strait forward to add an additional check for cases where no new content came in via sync, whether the currently distribution is actually serving a publication built from the expected repo version, and then re-publish if the check failed.

Steps to reproduce (using a deb/APT repo as an example):

1. Create a new repository of type deb (any repo containing some content will do)
2. Once the repo is saved, we need to simulate a failed pulp_deb publication. The most reliable way to do this is to patch the relevant pulp plugin, open: `/usr/lib/python3.9/site-packages/pulp_deb/app/tasks/publishing.py` and add the line `raise RuntimeError("TESTFAIL")` after the docstring of the `publish()` function.
3. Once you have patched pulp_deb, restart your pulp workers for the code change to take effect `systemctl restart pulp*` should do the trick.
4. Now sync your Katello test repo, and watch the publication fail in the dynflow task. Congratulations, you have a broken state.
5. Revert the patch from `/usr/lib/python3.9/site-packages/pulp_deb/app/tasks/publishing.py` and restart your pulp workers again.
6. You can verify your publication is wrong by opening the "Published At" link on the repo page and in the APT repo case, verifying that it does not contain a `pool/` folder, but the repo page (perhaps after refresh) is listing "deb Packages" > 0.
7. Now re-sync the repo and assuming there is no new content on the remote, the sync task will be green, but you will still not have a pool/ folder in the "Published At" location, because it did not attempt to create a new publication.

Actions

Also available in: Atom PDF