Bug #17905
openkatello:upgrade_check aborts on systems without an UUID
Description
Cloned from https://bugzilla.redhat.com/show_bug.cgi?id=1409795
Satellite 6.1.11 (not .10 as in the BZ version, seems there is no .11?)
Description of problem:
While running "foreman-rake katello:upgrade_check" on a big customer database, rake would abort while scanning the systems:
- Invoke katello:preupgrade_content_host_check (first_time)
- Invoke environment (first_time)
- Execute environment
API controllers newer than Apipie cache! Run apipie:cache rake task to regenerate cache. - Execute katello:preupgrade_content_host_check
Calculating Host changes on upgrade. This may take a few minutes.
rake aborted!
Expect initializer to return hash if a group of attributes is defined by lazy_accessor
/opt/rh/ruby193/root/usr/share/gems/gems/katello-2.2.0.93/app/lib/katello/lazy_accessor.rb:177:in `run_initializer'
/opt/rh/ruby193/root/usr/share/gems/gems/katello-2.2.0.93/app/lib/katello/lazy_accessor.rb:154:in `lazy_attribute_get'
/opt/rh/ruby193/root/usr/share/gems/gems/katello-2.2.0.93/app/lib/katello/lazy_accessor.rb:74:in `block (2 levels) in lazy_accessor'
/opt/rh/ruby193/root/usr/share/gems/gems/katello-2.2.0.93/lib/katello/tasks/preupgrade_content_host_check.rake:48:in `block in get_systems_with_facts'
/opt/rh/ruby193/root/usr/share/gems/gems/katello-2.2.0.93/lib/katello/tasks/preupgrade_content_host_check.rake:46:in `each'
/opt/rh/ruby193/root/usr/share/gems/gems/katello-2.2.0.93/lib/katello/tasks/preupgrade_content_host_check.rake:46:in `get_systems_with_facts'
/opt/rh/ruby193/root/usr/share/gems/gems/katello-2.2.0.93/lib/katello/tasks/preupgrade_content_host_check.rake:18:in `ensure_one_system_per_hostname'
/opt/rh/ruby193/root/usr/share/gems/gems/katello-2.2.0.93/lib/katello/tasks/preupgrade_content_host_check.rake:103:in `block (2 levels) in <top (required)>'
/opt/rh/ruby193/root/usr/share/ruby/rake/task.rb:205:in `call'
/opt/rh/ruby193/root/usr/share/ruby/rake/task.rb:205:in `block in execute'
/opt/rh/ruby193/root/usr/share/ruby/rake/task.rb:200:in `each'
/opt/rh/ruby193/root/usr/share/ruby/rake/task.rb:200:in `execute'
/opt/rh/ruby193/root/usr/share/ruby/rake/task.rb:158:in `block in invoke_with_call_chain'
/opt/rh/ruby193/root/usr/share/ruby/monitor.rb:211:in `mon_synchronize'
/opt/rh/ruby193/root/usr/share/ruby/rake/task.rb:151:in `invoke_with_call_chain'
/opt/rh/ruby193/root/usr/share/ruby/rake/task.rb:144:in `invoke'
/opt/rh/ruby193/root/usr/share/ruby/rake/application.rb:116:in `invoke_task'
/opt/rh/ruby193/root/usr/share/ruby/rake/application.rb:94:in `block (2 levels) in top_level'
/opt/rh/ruby193/root/usr/share/ruby/rake/application.rb:94:in `each'
/opt/rh/ruby193/root/usr/share/ruby/rake/application.rb:94:in `block in top_level'
/opt/rh/ruby193/root/usr/share/ruby/rake/application.rb:133:in `standard_exception_handling'
/opt/rh/ruby193/root/usr/share/ruby/rake/application.rb:88:in `top_level'
/opt/rh/ruby193/root/usr/share/ruby/rake/application.rb:66:in `block in run'
/opt/rh/ruby193/root/usr/share/ruby/rake/application.rb:133:in `standard_exception_handling'
/opt/rh/ruby193/root/usr/share/ruby/rake/application.rb:63:in `run'
/opt/rh/ruby193/root/usr/bin/rake:32:in `<main>'
Tasks: TOP => katello:preupgrade_content_host_check
The code in question seems to be:
systems.each do |system|
begin
facts = system.facts
unless facts
systems_to_remove.push(system)
end
rescue RestClient::Exception
systems_to_remove.push(system)
end
end
Line 48 is "facts = system.facts".
After adding a tactical "puts system.inspect" just before the system.facts line, we could identify the bad system:
#<Katello::System id: 6891, uuid: nil, name: "hostname", description: "Initial Registration Params", location: "None", environment_id: 4, created_at: "2016-11-15 12:58:01", updated_at: "2016-11-15 12:58:01", type: "Katello::System", content_view_id: 15, host_id: nil>
rake aborted!
Looking into PostgreSQL revealed that we actually had two systems with that symptom:
foreman=# select * from katello_systems where uuid is null;
id | uuid | name | description | location | environment_id | created_at | updated_at | type | content_view_id | host_id
------+------+-----------------------+-----------------------------+----------+----------------+----------------------------+----------------------------+-----------------+-----------------+---------
6891 | | hostname | Initial Registration Params | None | 4 | 2016-11-15 12:58:01.246012 | 2016-11-15 12:58:01.246012 | Katello::System | 15 |
6262 | | hostname2 | Initial Registration Params | None | 4 | 2016-09-26 09:06:38.945969 | 2016-09-26 09:06:38.945969 | Katello::System | 16 |
(2 rows)
PostgreSQL would also tell us that there were another two systems with those hostnames, but now with proper UUIDs.
Seems the initial registration of those wen't badly and they were re-registered.
After erasing the two broken systems from the DB the upgrade_check would run fine.
I think the upgrade_check.rake needs a bit more of error handling, as I would expect it to catch this bad systems and tell me about them, not choke on them.
Version-Release number of selected component (if applicable):
Satellite 6.1.11
How reproducible:
Always, but no idea how the initial problematic host was created
Steps to Reproduce:
1. create a katello::system without a uuid
2. run foreman-rake katello:upgrade_check
Actual results:
rake aborted
Expected results:
system is said to be faulty
NOTE:
this seems like a consequence of a record without orchestration task successfully finished
Updated by Justin Sherrill over 8 years ago
- Subject changed from katello:upgrade_check aborts on systems without an UUID to katello:upgrade_check aborts on systems without an UUID
- Translation missing: en.field_release set to 114