Feature #30824
closedsmartly sync capsules with history tracking
Description
Solution is such that:
We can store some data:
smart_proxy_id
repository_id
started
finished
As part of any capsule sync for a given repo: SYNC_HISTORY At the start of the sync for a given repo and smart proxy, any entry in the data would be deleted A new entry would be created with the current start time At the end of the sync, the finished time would be filled in, if the data still exists (hasn't been deleted by another task) This functions as a 'two-stage' ACK of having synced successfully
As part of CV promotion if changes are detected for a given repo during CV promotion, delete any entries for those repos in the SYNC_HISTORY table
as part of repository update if a repository is updated, unprotected changed or download_policy changed (for inherited proxies), delete all history items for any instance of that repository
as part of content upload into a repo, any history events for that repo are deleted.
if upstream_name for docker repos is changed, delete all history items for that particular repostiory
at repo sync time in Library if there are changes (or package upload/remove), we need to delete all hsitory items for that particular repository
At capsule sync time, if a history event exists in the table for a given repo and smart proxy that 'finished', do not schedule the sync.
A full capsule sync will delete all history events for that capsule
Updated by James Jeffers over 4 years ago
- Target version set to Katello 3.18.0
- Triaged changed from No to Yes
Updated by Justin Sherrill over 4 years ago
- Tracker changed from Bug to Feature
- Subject changed from investigate and improve optimizations around capsule sync with pulp3 and yum to smartly sync capsules with history tracking
- Description updated (diff)
Updated by Pavel Moravec over 4 years ago
Justin Sherrill wrote:
Solution is such that:
We can store some data:
smart_proxy_id
repository_id
started
finishedAs part of any capsule sync for a given repo: SYNC_HISTORY At the start of the sync for a given repo and smart proxy, any entry in the data would be deleted A new entry would be created with the current start time At the end of the sync, the finished time would be filled in, if the data still exists (hasn't been deleted by another task) This functions as a 'two-stage' ACK of having synced successfully
"if the data still exists" - I assume the same start time (what if one invokes multiple synces of the repo to the Caps, then the start time can point to another sync instance).
"At the end of the sync" - I assume if the sync succeeds only. If it fails, we should remove the start timestamp?
at repo sync time in Library if there are changes (or package upload/remove), we need to delete all hsitory items for that particular repository
"if there are changes" - what is the ultimate way to know this? (that pulp re-published the repo?)
At capsule sync time, if a history event exists in the table for a given repo and smart proxy that 'finished', do not schedule the sync.
So "prevent concurrent Capsule synces" feature will be implemented, in fact? Great! I am just curious on its granularity (whole caps sync vs. caps+repo sync) and concurrency (CV promote triggers few repo synces to a Capsule - what if one invokes this Caps sync while the repo synces are still in progress? Or vice verse?)
Updated by Justin Sherrill over 4 years ago
Pavel Moravec wrote:
Justin Sherrill wrote:
Solution is such that:
We can store some data:
smart_proxy_id
repository_id
started
finishedAs part of any capsule sync for a given repo: SYNC_HISTORY At the start of the sync for a given repo and smart proxy, any entry in the data would be deleted A new entry would be created with the current start time At the end of the sync, the finished time would be filled in, if the data still exists (hasn't been deleted by another task) This functions as a 'two-stage' ACK of having synced successfully
"if the data still exists" - I assume the same start time (what if one invokes multiple synces of the repo to the Caps, then the start time can point to another sync instance).
Its a 'two ack' process. So a history event includes a start time, and then a successful finish time. Any history event without a successful finish time would essentially be ignored when deciding whether to sync a particular repo to a smart proxy.
"At the end of the sync" - I assume if the sync succeeds only. If it fails, we should remove the start timestamp?
Yes, if it fails, we would delete the history event that was created in that task, in order to clean it up. If it dies completely (i.e. process is killed), the finish time will never be filled in and it won't be considered when deciding if a repo needs to be synced or not.
at repo sync time in Library if there are changes (or package upload/remove), we need to delete all hsitory items for that particular repository
"if there are changes" - what is the ultimate way to know this? (that pulp re-published the repo?)
We do this today by looking at the 'added_count/removed_count/updated_count' for the sync in task. In pulp3 we can do it in a similar way (by looking for a created repo version)
At capsule sync time, if a history event exists in the table for a given repo and smart proxy that 'finished', do not schedule the sync.
So "prevent concurrent Capsule synces" feature will be implemented, in fact? Great! I am just curious on its granularity (whole caps sync vs. caps+repo sync) and concurrency (CV promote triggers few repo synces to a Capsule - what if one invokes this Caps sync while the repo synces are still in progress? Or vice verse?)
It actually wouldn't prevent them, as currently in progress syncs would have finish time blank on their history events, so the new syncs would still be started.
Updated by The Foreman Bot about 4 years ago
- Status changed from New to Ready For Testing
- Assignee set to Samir Jha
- Pull request https://github.com/Katello/katello/pull/8975 added
Updated by Brad Buckingham about 4 years ago
- Category changed from Performance to Repositories
Updated by Brad Buckingham about 4 years ago
- Category changed from Repositories to Performance
Updated by The Foreman Bot about 4 years ago
- Fixed in Releases Katello 4.0.0 added
Updated by Samir Jha about 4 years ago
- Status changed from Ready For Testing to Closed
Applied in changeset katello|2e1af3567183a9b74d218cddc7c9bfdee313d32d.