Project

General

Profile

Bug #28881

[RFE] Speed up the restore process

Added by Mike McCune 9 months ago. Updated 9 months ago.

Status:
Closed
Priority:
High
Assignee:
Category:
-
Target version:
-
Difficulty:
Triaged:
No
Bugzilla link:
Fixed in Releases:
Found in Releases:

Description

TL;DR during Pulp backup only save RPM files and once Satellite is restored launch a job that will create tasks to regenerate metadata of each CV to parallelize the process.

Description of problem:
Currently a user can choose between doing a backup of the complete Satellite environment or not to backup Pulp content.

If user decided to backup Pulp, pulp_data.tar will contain all the RPMs and all the symbolic links that have been created for Content Views.

In a large environment, with multiple CVs and multiple versions of each CV, pulp_data.tar can contain millions of symbolic links that have to be recovered during the restore. Reading a tar file is a lineal non-threaded process and writing millions of symbolic files can take days, as it will use just one CPU core.

If user decided not to backup Pulp, pulp_data.tar will be empty. Restore will be much faster but Satellite needs to sync all the files again. In case there are custom RPMs that have been uploaded using UI or hammer, it is required to upload them again. Once all content is available, metadata regeneration is required for each version of CVs.

One way to improve this could be to backup Pulp, but only RPM data and no symbolic links. With this method, pulp_data.tar restore won't take as much time as when tar file contains symbolic links. Once Satellite is running again, an automated process should launch metadata regeneration for each of the CVs, prioritizing the ones that are promoted. This would create a task for each regeneration and allow the usage of multiple CPUs, parallelizing the task and having a much faster restore than when symbolic files have to be created from tar file.

How reproducible:
Always

Steps to Reproduce:
1. Sync multiple RHEL releases
2. Create different CVs
3. Backup with full Pulp content
4. Restore with Pulp content

Actual results:
Restore will start creating millions of symbolic links from tar, that uses a single CPU for this with no parallelism. This process can take days.

Expected results:
Be able to have a running Satellite faster, even in degraded mode until it ends to fully recover.

Associated revisions

Revision 124f93c0 (diff)
Added by Mike McCune 9 months ago

Fixes #28881 - switch to using --absolute-names for tar extract

Speeds up extraction of large Pulp tar files during restore by
an order of magnitude

History

#1 Updated by Mike McCune 9 months ago

  • Bugzilla link set to 1726595

#2 Updated by The Foreman Bot 9 months ago

  • Status changed from New to Ready For Testing
  • Pull request https://github.com/theforeman/foreman_maintain/pull/312 added

#3 Updated by Anonymous 9 months ago

  • Status changed from Ready For Testing to Closed

Also available in: Atom PDF