Bug #35142
open'foreman-maintain content migration-stats' command stucks and consume all memory
Description
Clone from bugzilla https://bugzilla.redhat.com/show_bug.cgi?id=2093829
Description of problem:
"foreman-maintain content migration-stats" command will run for extremely long time (several hours to day) and consume all the system memory when unmigratable contents are large, such 10K+.
- foreman-maintain content migration-stats
Running Retrieve Pulp 2 to Pulp 3 migration statistics ================================================================================
Retrieve Pulp 2 to Pulp 3 migration statistics: <=========== Stuck in here for long time
- Memory consumption is increasing quick.
-----------------------------------------------------------
foreman 14301 16.0 1.9 816904 384304 ? Ssl 17:23 1:32 /opt/rh/rh-ruby25/root/usr/bin/ruby /opt/rh/rh-ruby25/root/usr/bin/rake katello:pulp3_migration_stats
foreman 14301 15.1 4.5 1337164 901500 ? Rsl 17:23 1:36 /opt/rh/rh-ruby25/root/usr/bin/ruby /opt/rh/rh-ruby25/root/usr/bin/rake katello:pulp3_migration_stats
foreman 14301 11.7 14.3 3311908 2864024 ? Ssl 17:23 1:50 /opt/rh/rh-ruby25/root/usr/bin/ruby /opt/rh/rh-ruby25/root/usr/bin/rake katello:pulp3_migration_stats
foreman 14301 12.4 33.5 7130700 6684896 ? Rsl 17:23 2:18 /opt/rh/rh-ruby25/root/usr/bin/ruby /opt/rh/rh-ruby25/root/usr/bin/rake katello:pulp3_migration_stats
-----------------------------------------------------------
Steps to Reproduce:
1.Prepare a Satellite 6.9.9 with about 20k or more of rpms and many content views and many content view versions.
2. To simulate unmigratable rpms we can run the following command to flag all rpms as missing from migration.
foreman-rake console
Katello::Rpm.update_all(migrated_pulp3_href: nil, missing_from_migration: true)
exit
3. Run "foreman-maintain content migration-stats" command
Actual results:
Stuck and memory consumption is increasing overtime until OOM is triggered.
Expected results:
Run successfully and consume reasonable amount of system memory.
In Addition to the memory issue, the output files also printed many duplicate rows which is the reason that the script can take several hours to days to run.
In my case, it wrote 500 duplicated exactly the same rows:---------------------------------------------------------
- grep "opa-fm-10.0.0.0-444.el7.x86_64.rpm,1,Red Hat Enterprise Linux 7 Server RPMs x86_64 7Server,Default Organization View,1.0" Rpm | wc
l
500
---------------------------------------------------------