Bug #17512

Synchronizing a repository with large amount rpms causes large memory usages while indexing

Added by Justin Sherrill over 4 years ago. Updated almost 3 years ago.

Target version:
Bugzilla link:
Fixed in Releases:
Found in Releases:


Cloned from

Description of problem:
When synchronizing repositories with large amount of RPMs causes the dynflow_executor process to grow, probably due to loading too much memory into variable while performing the indexing

Version-Release number of selected component (if applicable):
Sat 6.2

How reproducible:

Steps to Reproduce:
1. enable several large repositories (such as Red_Hat_Enterprise_Linux_Desktop-Red_Hat_Enterprise_Linux_5_Desktop_RPMs_x86_64_5Client, Red_Hat_Enterprise_Linux_Desktop-Red_Hat_Enterprise_Linux_5_Desktop_RPMs_x86_64_5_11, Red_Hat_Enterprise_Linux_Desktop-Red_Hat_Enterprise_Linux_5_Desktop_RPMs_i386_5_11, Red_Hat_Enterprise_Linux_Desktop-Red_Hat_Enterprise_Linux_5_Desktop_RPMs_i386_5Client)
2. synchronize the repositories (one repo should be enough, but the more running at once, the more the growth should be visible)
3. watch the RSS of dynflow_executor grow

Actual results:
The memory of the executor grows very vast

Expected results:
Much lower memory growth

Associated revisions

Revision e5cd23c7 (diff)
Added by Justin Sherrill over 4 years ago

Fixes #17512 - index content in a paged manner

This does two major things:

1) Centralizes fetching and indexing of content to central places
(pulp_database_unit.rb & pulp_content_unit.rb). This should help making
content units pluggable in the future easier, as all the code is based
around the unit DB class or service class.
2) Change the indexing so that we only keep a page (default is 500) of
units in memory. Previously we would fetch all units in pages, but then
take the entire list and index that. With this change we actually index
just the page while we have it, and then throw it away to fetch another
page and process that.


#1 Updated by The Foreman Bot over 4 years ago

  • Status changed from New to Ready For Testing
  • Pull request added

#2 Updated by Justin Sherrill over 4 years ago

  • Category changed from 111 to Repositories

#3 Updated by Justin Sherrill over 4 years ago

  • Status changed from Ready For Testing to Closed
  • % Done changed from 0 to 100

#4 Updated by Brad Buckingham over 4 years ago

  • Target version set to 147

#5 Updated by Justin Sherrill over 4 years ago

  • Legacy Backlogs Release (now unused) set to 188

Also available in: Atom PDF