Project

General

Profile

Feature #30824

smartly sync capsules with history tracking

Added by Justin Sherrill about 2 months ago. Updated 26 minutes ago.

Status:
Ready For Testing
Priority:
Normal
Assignee:
Category:
Performance
Target version:
Difficulty:
Triaged:
Yes
Bugzilla link:
Fixed in Releases:
Found in Releases:

Description

Solution is such that:

We can store some data:
smart_proxy_id
repository_id
started
finished

As part of any capsule sync for a given repo: SYNC_HISTORY At the start of the sync for a given repo and smart proxy, any entry in the data would be deleted A new entry would be created with the current start time At the end of the sync, the finished time would be filled in, if the data still exists (hasn't been deleted by another task) This functions as a 'two-stage' ACK of having synced successfully

As part of CV promotion if changes are detected for a given repo during CV promotion, delete any entries for those repos in the SYNC_HISTORY table

as part of repository update if a repository is updated, unprotected changed or download_policy changed (for inherited proxies), delete all history items for any instance of that repository

as part of content upload into a repo, any history events for that repo are deleted.

if upstream_name for docker repos is changed, delete all history items for that particular repostiory

at repo sync time in Library if there are changes (or package upload/remove), we need to delete all hsitory items for that particular repository

At capsule sync time, if a history event exists in the table for a given repo and smart proxy that 'finished', do not schedule the sync.
A full capsule sync
will delete all history events for that capsule

History

#1 Updated by James Jeffers about 1 month ago

  • Triaged changed from No to Yes
  • Target version set to Katello 3.18.0

#2 Updated by Justin Sherrill about 1 month ago

  • Description updated (diff)
  • Subject changed from investigate and improve optimizations around capsule sync with pulp3 and yum to smartly sync capsules with history tracking
  • Tracker changed from Bug to Feature

#3 Updated by Justin Sherrill about 1 month ago

  • Description updated (diff)

#4 Updated by Justin Sherrill about 1 month ago

  • Description updated (diff)

#5 Updated by Pavel Moravec about 1 month ago

Justin Sherrill wrote:

Solution is such that:

We can store some data:
smart_proxy_id
repository_id
started
finished

As part of any capsule sync for a given repo: SYNC_HISTORY At the start of the sync for a given repo and smart proxy, any entry in the data would be deleted A new entry would be created with the current start time At the end of the sync, the finished time would be filled in, if the data still exists (hasn't been deleted by another task) This functions as a 'two-stage' ACK of having synced successfully

"if the data still exists" - I assume the same start time (what if one invokes multiple synces of the repo to the Caps, then the start time can point to another sync instance).

"At the end of the sync" - I assume if the sync succeeds only. If it fails, we should remove the start timestamp?

at repo sync time in Library if there are changes (or package upload/remove), we need to delete all hsitory items for that particular repository

"if there are changes" - what is the ultimate way to know this? (that pulp re-published the repo?)

At capsule sync time, if a history event exists in the table for a given repo and smart proxy that 'finished', do not schedule the sync.

So "prevent concurrent Capsule synces" feature will be implemented, in fact? Great! I am just curious on its granularity (whole caps sync vs. caps+repo sync) and concurrency (CV promote triggers few repo synces to a Capsule - what if one invokes this Caps sync while the repo synces are still in progress? Or vice verse?)

#6 Updated by Justin Sherrill about 1 month ago

Pavel Moravec wrote:

Justin Sherrill wrote:

Solution is such that:

We can store some data:
smart_proxy_id
repository_id
started
finished

As part of any capsule sync for a given repo: SYNC_HISTORY At the start of the sync for a given repo and smart proxy, any entry in the data would be deleted A new entry would be created with the current start time At the end of the sync, the finished time would be filled in, if the data still exists (hasn't been deleted by another task) This functions as a 'two-stage' ACK of having synced successfully

"if the data still exists" - I assume the same start time (what if one invokes multiple synces of the repo to the Caps, then the start time can point to another sync instance).

Its a 'two ack' process. So a history event includes a start time, and then a successful finish time. Any history event without a successful finish time would essentially be ignored when deciding whether to sync a particular repo to a smart proxy.

"At the end of the sync" - I assume if the sync succeeds only. If it fails, we should remove the start timestamp?

Yes, if it fails, we would delete the history event that was created in that task, in order to clean it up. If it dies completely (i.e. process is killed), the finish time will never be filled in and it won't be considered when deciding if a repo needs to be synced or not.

at repo sync time in Library if there are changes (or package upload/remove), we need to delete all hsitory items for that particular repository

"if there are changes" - what is the ultimate way to know this? (that pulp re-published the repo?)

We do this today by looking at the 'added_count/removed_count/updated_count' for the sync in task. In pulp3 we can do it in a similar way (by looking for a created repo version)

At capsule sync time, if a history event exists in the table for a given repo and smart proxy that 'finished', do not schedule the sync.

So "prevent concurrent Capsule synces" feature will be implemented, in fact? Great! I am just curious on its granularity (whole caps sync vs. caps+repo sync) and concurrency (CV promote triggers few repo synces to a Capsule - what if one invokes this Caps sync while the repo synces are still in progress? Or vice verse?)

It actually wouldn't prevent them, as currently in progress syncs would have finish time blank on their history events, so the new syncs would still be started.

#7 Updated by The Foreman Bot 26 days ago

  • Assignee set to Samir Jha
  • Status changed from New to Ready For Testing
  • Pull request https://github.com/Katello/katello/pull/8975 added

#8 Updated by Brad Buckingham 5 days ago

  • Category changed from Performance to Repositories

#9 Updated by Brad Buckingham 5 days ago

  • Bugzilla link set to 1890683

#10 Updated by Brad Buckingham 5 days ago

  • Category changed from Repositories to Performance

Also available in: Atom PDF