Project

General

Profile

Actions

Bug #36926

closed

[Improvement] RefreshRepos step in smart proxy sync to refresh just repos to sync

Added by Pavel Moravec about 1 year ago. Updated 11 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Performance
Target version:
Difficulty:
easy
Triaged:
Yes
Fixed in Releases:
Found in Releases:

Description

Description of problem:
Actions::Pulp3::Orchestration::Repository::RefreshRepos can be invoked to all Capsule's repositories during Caps sync, despite just a few repos are needed for that. This redundantly delays Capsule sync time (up to a few minutes).

In scenario (seen in a huge scale at linked customer case):
- having "Sync Capsules after content view promotion" disabled
- having many hundreds to thousands of repos already synced on the Capsule
- promoting just a small CV as the only new content to be synced to Caps

Then invoking Capsule sync triggers RefreshRepos "unscoped" (not restricted to CV or LE or repo), causing all hundreds to thousands of Remotee objects are checked and updated on the Capsule. That can span many minutes, even.

Idea of particular fix: current ordering of dynflow steps is:

3: Actions::Pulp3::ContentGuard::Refresh (success) [ 0.14s / 0.14s ]
5: Actions::Pulp3::Orchestration::Repository::RefreshRepos (success) [ 7.05s / 7.05s ]
7: Actions::Katello::CapsuleContent::SyncCapsule (success) [ 0.16s / 0.16s ]
9: Actions::Pulp3::CapsuleContent::Sync (success) [ 2.26s / 0.59s ]
11: Actions::Pulp3::CapsuleContent::GenerateMetadata (success) [ 0.16s / 0.16s ]
13: Actions::Pulp3::CapsuleContent::RefreshDistribution (success) [ 1.64s / 0.68s ]

(The tripple 9,11,13 is repeated for every repo required to sync)

That is defined in app/lib/actions/katello/capsule_content/sync.rb :

sequence do
if smart_proxy.has_feature?(SmartProxy::PULP3_FEATURE)
plan_action(Actions::Pulp3::ContentGuard::Refresh, smart_proxy)
plan_action(Actions::Pulp3::Orchestration::Repository::RefreshRepos, smart_proxy, refresh_options)
end
plan_action(SyncCapsule, smart_proxy, refresh_options)

Can't we move the RefreshRepos into SyncCapsule (respecting PULP3_FEATURE test), to:

def plan(smart_proxy, options = {})
plan_self(:smart_proxy_id => smart_proxy.id)
action_subject(smart_proxy)
environment = options[:environment]
content_view = options[:content_view]
repository = options[:repository]
skip_metadata_check = options.fetch(:skip_metadata_check, false)
sequence do
repos = repos_to_sync(smart_proxy, environment, content_view, repository, skip_metadata_check)
return nil if repos.empty?
HERE CALL RefreshRepos !
repos.in_groups_of(Setting[:foreman_proxy_content_batch_size], false) do |repo_batch|
..

We just have to extend RefreshRepos to accept options[:repository_list] additionally to options[:repository] ..?

Version-Release number of selected component (if applicable):
Sat 6.12 or newer, incl. 6.14

How reproducible:
100%

Steps to Reproduce:
1. Have "Sync Capsules after content view promotion" disabled
2. Have many CVs in many LEs with many repos, all synced to a Capsule
3. Have a CV with even one repo, also synced to the Capsule; now publish+promote a new version (such that Remote is updated, not created)
4. Sync the Capsule
5. Check Actions::Pulp3::Orchestration::Repository::RefreshRepos dynflow step: how many pulp_tasks: will be there (alternatively, monitor /var/log/httpd/rhsm-pulpcore-https-443_access_ssl.log on the Capsule for pairs of requests:

1.2.3.4 - - [15/Nov/2023:22:38:39 +0100] "PATCH /pulp/api/v3/remotes/rpm/rpm/141295ff-9731-417d-a6e5-0359f527fa51/ HTTP/1.1" 202 67 "-" "OpenAPI-Generator/3.19.6/ruby"
1.2.3.4 - - [15/Nov/2023:22:38:41 +0100] "GET /pulp/api/v3/remotes/rpm/rpm/?name=1-cv_onerepo-DEV-02c5c6a5-228d-456b-81da-c7c4ffa9f62e HTTP/1.1" 200 6432 "-" "OpenAPI-Generator/3.19.6/ruby"

each pair corresponds to one pulp_task from the dynflow step.

Actual results:
5. You see as many pulp tasks (or pairs of requests) as many repos on the Capsule are. Overall duration of the RefreshRepos step linearly grows with # repos on Capsule.

Expected results:
5. Just one pair of requests / one pulp task to be triggered. RefreshRepos runs in constant time regardless of # of repos already present on the Capsule.

Additional info:

Actions

Also available in: Atom PDF