Bug #31540
closedCommand exceeded timeout while Installer executes foreman-rake db:migrate
Description
Cloned from https://bugzilla.redhat.com/show_bug.cgi?id=1904963
Description of problem:
During an upgrade of 6.7.5 to 6.8.2 (snap 2) the db:migrate step in the installer fails, as it doesn't finish in 5 minutes (which is the default timeout for Puppet exec).
[DEBUG 2020-12-04T13:14:09 main] Exec[foreman-rake-db:migrate](provider=posix): Executing '/usr/sbin/foreman-rake db:migrate'
[DEBUG 2020-12-04T13:14:09 main] Executing with uid=foreman: '/usr/sbin/foreman-rake db:migrate'
[ERROR 2020-12-04T13:19:09 main] Command exceeded timeout
One interesting detail is that Puppet does NOT kill the process, so the db:migrate is running in the background and tries to re-run the installer will result in other, confusing errors (ActiveRecord::ConcurrentMigrationError, Foreman::PermissionMissingException -- the later seems to happen on the "unless" command)
The migration run took roughtly 6 hours on my setup, and I could finish the upgrade just fine after that.
Looking at the PostgreSQL activity during the migration, I've seen the following query over and over again:
postgres=# select backend_start,xact_start,query_start,query from pg_stat_activity where state='active';
2020-12-04 13:15:07.457939-05 | 2020-12-04 13:15:08.027364-05 | 2020-12-04 14:07:56.01994-05 | SELECT COUNT(*) FROM "katello_rpms" WHERE "katello_rpms"."id" IN (SELECT "katello_repository_rpms"."rpm_id" FROM "katello_repository_r
pms" WHERE "katello_repository_rpms"."repository_id" IN (SELECT "katello_repositories"."id" FROM "katello_repositories" WHERE "katello_repositories"."content_view_version_id" = $1 AND (environment_id is NULL)))
As the migrations ran without a terminal attached, I don't know which migration triggered that, but looking at the Katello migrations between 6.7.5 and 6.8.2, I think it could be one of these:
db/migrate/20191204214919_add_content_view_version_counts.rb (most probably!)
db/migrate/20200129172534_add_epoch_version_release_arch_to_katello_installed_packages.rb
db/migrate/20200501155054_installed_package_unique_nvrea.rb
I think there are actually two bugs here:
1. the installer should not assume db:migrate can finish in 5 minutes, this is not realistic.
2. the migration should not require 6 hours to finish.
Version-Release number of selected component (if applicable):
Satellite 6.8.2 Snap 2
How reproducible:
100%
Steps to Reproduce:
1. take a 6.7.5 dogfood server (big db, rather slow-ish storage)
2. try to upgrade to 6.8.2
Actual results:
Fails
Expected results:
Succeeds ;-)
Additional info:
This is probably related, but not exactly the same as #1888983