Project

General

Profile

Actions

Feature #30824

closed

smartly sync capsules with history tracking

Added by Justin Sherrill over 3 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Performance
Target version:
Difficulty:
Triaged:
Yes
Fixed in Releases:
Found in Releases:

Description

Solution is such that:

We can store some data:
smart_proxy_id
repository_id
started
finished

As part of any capsule sync for a given repo: SYNC_HISTORY At the start of the sync for a given repo and smart proxy, any entry in the data would be deleted A new entry would be created with the current start time At the end of the sync, the finished time would be filled in, if the data still exists (hasn't been deleted by another task) This functions as a 'two-stage' ACK of having synced successfully

As part of CV promotion if changes are detected for a given repo during CV promotion, delete any entries for those repos in the SYNC_HISTORY table

as part of repository update if a repository is updated, unprotected changed or download_policy changed (for inherited proxies), delete all history items for any instance of that repository

as part of content upload into a repo, any history events for that repo are deleted.

if upstream_name for docker repos is changed, delete all history items for that particular repostiory

at repo sync time in Library if there are changes (or package upload/remove), we need to delete all hsitory items for that particular repository

At capsule sync time, if a history event exists in the table for a given repo and smart proxy that 'finished', do not schedule the sync.
A full capsule sync
will delete all history events for that capsule

Actions #1

Updated by James Jeffers over 3 years ago

  • Target version set to Katello 3.18.0
  • Triaged changed from No to Yes
Actions #2

Updated by Justin Sherrill over 3 years ago

  • Tracker changed from Bug to Feature
  • Subject changed from investigate and improve optimizations around capsule sync with pulp3 and yum to smartly sync capsules with history tracking
  • Description updated (diff)
Actions #3

Updated by Justin Sherrill over 3 years ago

  • Description updated (diff)
Actions #4

Updated by Justin Sherrill over 3 years ago

  • Description updated (diff)
Actions #5

Updated by Pavel Moravec over 3 years ago

Justin Sherrill wrote:

Solution is such that:

We can store some data:
smart_proxy_id
repository_id
started
finished

As part of any capsule sync for a given repo: SYNC_HISTORY At the start of the sync for a given repo and smart proxy, any entry in the data would be deleted A new entry would be created with the current start time At the end of the sync, the finished time would be filled in, if the data still exists (hasn't been deleted by another task) This functions as a 'two-stage' ACK of having synced successfully

"if the data still exists" - I assume the same start time (what if one invokes multiple synces of the repo to the Caps, then the start time can point to another sync instance).

"At the end of the sync" - I assume if the sync succeeds only. If it fails, we should remove the start timestamp?

at repo sync time in Library if there are changes (or package upload/remove), we need to delete all hsitory items for that particular repository

"if there are changes" - what is the ultimate way to know this? (that pulp re-published the repo?)

At capsule sync time, if a history event exists in the table for a given repo and smart proxy that 'finished', do not schedule the sync.

So "prevent concurrent Capsule synces" feature will be implemented, in fact? Great! I am just curious on its granularity (whole caps sync vs. caps+repo sync) and concurrency (CV promote triggers few repo synces to a Capsule - what if one invokes this Caps sync while the repo synces are still in progress? Or vice verse?)

Actions #6

Updated by Justin Sherrill over 3 years ago

Pavel Moravec wrote:

Justin Sherrill wrote:

Solution is such that:

We can store some data:
smart_proxy_id
repository_id
started
finished

As part of any capsule sync for a given repo: SYNC_HISTORY At the start of the sync for a given repo and smart proxy, any entry in the data would be deleted A new entry would be created with the current start time At the end of the sync, the finished time would be filled in, if the data still exists (hasn't been deleted by another task) This functions as a 'two-stage' ACK of having synced successfully

"if the data still exists" - I assume the same start time (what if one invokes multiple synces of the repo to the Caps, then the start time can point to another sync instance).

Its a 'two ack' process. So a history event includes a start time, and then a successful finish time. Any history event without a successful finish time would essentially be ignored when deciding whether to sync a particular repo to a smart proxy.

"At the end of the sync" - I assume if the sync succeeds only. If it fails, we should remove the start timestamp?

Yes, if it fails, we would delete the history event that was created in that task, in order to clean it up. If it dies completely (i.e. process is killed), the finish time will never be filled in and it won't be considered when deciding if a repo needs to be synced or not.

at repo sync time in Library if there are changes (or package upload/remove), we need to delete all hsitory items for that particular repository

"if there are changes" - what is the ultimate way to know this? (that pulp re-published the repo?)

We do this today by looking at the 'added_count/removed_count/updated_count' for the sync in task. In pulp3 we can do it in a similar way (by looking for a created repo version)

At capsule sync time, if a history event exists in the table for a given repo and smart proxy that 'finished', do not schedule the sync.

So "prevent concurrent Capsule synces" feature will be implemented, in fact? Great! I am just curious on its granularity (whole caps sync vs. caps+repo sync) and concurrency (CV promote triggers few repo synces to a Capsule - what if one invokes this Caps sync while the repo synces are still in progress? Or vice verse?)

It actually wouldn't prevent them, as currently in progress syncs would have finish time blank on their history events, so the new syncs would still be started.

Actions #7

Updated by The Foreman Bot over 3 years ago

  • Status changed from New to Ready For Testing
  • Assignee set to Samir Jha
  • Pull request https://github.com/Katello/katello/pull/8975 added
Actions #8

Updated by Brad Buckingham over 3 years ago

  • Category changed from Performance to Repositories
Actions #9

Updated by Brad Buckingham over 3 years ago

  • Bugzilla link set to 1890683
Actions #10

Updated by Brad Buckingham over 3 years ago

  • Category changed from Repositories to Performance
Actions #11

Updated by The Foreman Bot over 3 years ago

  • Fixed in Releases Katello 4.0.0 added
Actions #12

Updated by Samir Jha over 3 years ago

  • Status changed from Ready For Testing to Closed
Actions

Also available in: Atom PDF