Project

General

Profile

Bug #21110

virt-who cant talk to foreman anymore

Added by Philipp Mueller over 2 years ago. Updated almost 2 years ago.

Status:
Closed
Priority:
High
Category:
-
Target version:
Difficulty:
Triaged:
Bugzilla link:
Pull request:
Fixed in Releases:
Found in Releases:

Description

Hi there,

i have a problem with virt-who and foreman / katello.

Versions:

katello 3.4.5
virt-who-0.19-6.el7_4.noarch
candlepin-2.0.40-1.el7.noarch

Heres the output of virt-who -o -d

https://thepasteb.in/p/k5hYzyvyBGAfE

and the corresponding candlepin.log and production.log

https://thepasteb.in/p/y8h65P3qkOKcO
https://thepasteb.in/p/3lh7zgvQ8KLu1

In the Dynflow Console i can see the following task running for 6 days:

3: Actions::Candlepin::AsyncHypervisors (waiting for Candlepin to finish the task) [ 522541.42s / 1701.01s ]

I also have many other "Hypervisor Tasks" in Dynflow that seem to be stuck.
In Foreman -> Tasks i have none

Thank you for the help.


Related issues

Has duplicate Katello - Bug #21191: Error update VM Information with virt-whoDuplicate2017-10-04

History

#1 Updated by Justin Sherrill over 2 years ago

can you navigate to Monitor > tasks, find the async hypervisors task that you are seeing the issue with, then click on the 'raw' tab.

Finally can you copy and paste the full 'raw input ' and full 'raw output' into the ticket?

thanks,
Justin

#2 Updated by Michael Stead over 2 years ago

Could you also show the output from the following query against the candlepin database:

select * from cp_job where id like 'hypervisor_update%';

#3 Updated by Philipp Mueller over 2 years ago

Michael Stead wrote:

Could you also show the output from the following query against the candlepin database:

[...]

the query produces a 12MB large output. Basically it contains lots of lanes like this

hypervisor_update_e48d618e-50b3-4c7c-b000-bce2ec402f55 | 2017-09-26 16:42:37.207+02 | 2017-09-26 16:42:37.207+02 | | async group | foreman_admin | | | 6 | rs | 0 | org.candlepin.pinsetter.tasks.HypervisorUpdateJob | rs |

#4 Updated by Philipp Mueller over 2 years ago

Justin Sherrill wrote:

can you navigate to Monitor > tasks, find the async hypervisors task that you are seeing the issue with, then click on the 'raw' tab.

Finally can you copy and paste the full 'raw input ' and full 'raw output' into the ticket?

thanks,
Justin

Id: 2d47c542-b87e-4b86-8dfd-f602ac5922e8
Label: Actions::Katello::Host::Hypervisors
Duration: less than a minute
Raw input: {"services_checked"=>["candlepin", "candlepin_auth"],
"hypervisors"=>Step(3).output[:hypervisors]}
Raw output: {}
External Id: 791a438e-4ea2-42c7-92fe-bf560d104c91

#5 Updated by Michael Stead over 2 years ago

Thanks for the update.

Could you please run the following query on the candlepin DB to get a feel for how many of these jobs there are and the state they are in:

select distinct(state), count(id) as total from cp_job where id like 'hypervisor_update%' group by state;

#6 Updated by Philipp Mueller over 2 years ago

Michael Stead wrote:

Thanks for the update.

Could you please run the following query on the candlepin DB to get a feel for how many of these jobs there are and the state they are in:
[...]

117 rows

#7 Updated by Michael Stead over 2 years ago

Philipp Mueller wrote:

Michael Stead wrote:

Thanks for the update.

Could you please run the following query on the candlepin DB to get a feel for how many of these jobs there are and the state they are in:
[...]

117 rows

Please provide the full output.

#8 Updated by Justin Sherrill over 2 years ago

  • Status changed from New to Need more information
  • Assignee set to Justin Sherrill
  • Target version set to 217

#9 Updated by Philipp Mueller over 2 years ago

Michael Stead wrote:

Philipp Mueller wrote:

Michael Stead wrote:

Thanks for the update.

Could you please run the following query on the candlepin DB to get a feel for how many of these jobs there are and the state they are in:
[...]

117 rows

Please provide the full output.

sorry... here is the full output.

candlepin=# select distinct(state), count(id) as total from cp_job where id like 'hypervisor_update%' group by state;
state | total
-------+-------
6 | 3
0 | 2
(2 rows)

thanks for your help

#10 Updated by Michael Stead over 2 years ago

  • Assignee deleted (Justin Sherrill)
  • Target version deleted (217)

1. Please run the following query and paste the full output:

select * from cp_job where state in (0, 6) order by state;

2. We need to determine what candlepin job status records katello is looking for. Please look in the candlepin.log and grep for the latest instances of:

uri=/candlepin/jobs/hypervisor_update

3. I would like to see the full candlepin log, but I'm not sure that it can be uploaded here. If you could find a way for me to download it, it would be very helpful.

Thanks.

#11 Updated by Philipp Mueller over 2 years ago

Michael Stead wrote:

1. Please run the following query and paste the full output:
[...]

2. We need to determine what candlepin job status records katello is looking for. Please look in the candlepin.log and grep for the latest instances of:

uri=/candlepin/jobs/hypervisor_update

3. I would like to see the full candlepin log, but I'm not sure that it can be uploaded here. If you could find a way for me to download it, it would be very helpful.

Thanks.

Here is the Candlepin.log as tar.gz

https://pm92.de/nc/index.php/s/Xr6neXwgGwxI3VD

and here is the query
https://pm92.de/nc/index.php/s/9UBJAyMGOeTR1cZ

Thank you.

#12 Updated by Michael Stead over 2 years ago

Some notes:

Based on the info provided, it appears that candlepin's hypervisor update async job is not migrating out of the CREATED state, therefore katello never stops checking the Job status.

At this point, I'm not sure why the checkin jobs are remaining stuck in this state. Based on info from Philipp, the stuck jobs are both targeting the same Org.

Looking into this further, both for the issue at hand and a work-around.

#13 Updated by Philipp Mueller over 2 years ago

Michael Stead wrote:

Some notes:

Based on the info provided, it appears that candlepin's hypervisor update async job is not migrating out of the CREATED state, therefore katello never stops checking the Job status.

At this point, I'm not sure why the checkin jobs are remaining stuck in this state. Based on info from Philipp, the stuck jobs are both targeting the same Org.

Looking into this further, both for the issue at hand and a work-around.

Hi,

youre absolutely right, the job never exists the created state. when i set the state to created manually in the psql DB, it will stay in waiting state:

{"id":"hypervisor_update_c28c8d1c-9777-48e5-bdac-9b0bf4aa67fe","state":"CREATED","startTime":null,"finishTime":null,"result":null,"principalName":"foreman_admin","targetType":"owner","targetId":"rs","ownerId":"rs","correlationId":null,"resultData":null,"statusPath":"/jobs/hypervisor_update_c28c8d1c-9777-48e5-bdac-9b0bf4aa67fe","done":false,"group":"async group","created":"2017-10-02T10:56:41+0000","updated":"2017-10-02T10:56:41+0000"}

...

{"id":"hypervisor_update_c28c8d1c-9777-48e5-bdac-9b0bf4aa67fe","state":"WAITING","startTime":null,"finishTime":null,"result":null,"principalName":"foreman_admin","targetType":"owner","targetId":"rs","ownerId":"rs","correlationId":null,"resultData":null,"statusPath":"/jobs/hypervisor_update_c28c8d1c-9777-48e5-bdac-9b0bf4aa67fe","done":false,"group":"async group","created":"2017-10-02T10:56:41+0000","updated":"2017-10-02T10:56:41+0000"}

#14 Updated by Michael Stead over 2 years ago

youre absolutely right, the job never exists the created state. when i set the state to created manually in the psql DB, it will stay in waiting state:

The jobs that are currently in the candlepin DB... you set them to the CREATED state manually... from WAITING? If that is the case, candlepin will never pick them up.

Out of curiosity, did you at any point restart tomcat after virt-who started the hypervisor checkin request?

Something you could try (though I haven't tested it) would be to set all the cp Jobs to the Cancelled state (4) and see if that stops the katello tasks. Once this is done, and the candlepin log is silent again, we can try another virt-who checkin while capturing the candlepin logs.

1) First check to make sure that no hypervisor update jobs are currently in the running state:

select count(*) from cp_job where id like 'hypervisor_update%' and state=2;

2) If there are no running jobs lets cancel all the other hypervisor checkin jobs:

-- Cancel all of the hypervisor_update jobs.
update cp_job set state=4 where id like 'hypervisor_update%';

-- Make sure that all the hypervisor_update jobs have a state of 4 (cancelled)
select id, state from cp_job where id like 'hypervisor_update%';

3) Make sure that the katello tasks have now all stopped due to the cancelled candlepin jobs.

4) Make sure that the candlepin log is relatively silent.

5) Enable candlepin DEBUG logging and restart tomcat (all katello services if possible).

6) Once up and running again, try a manual virt-who checkin again while capturing the candlepin logs from start to finish.

Let me know if you have questions with anything here. I work AST but I'll try and log on earlier than normal in the morning to help out.

#15 Updated by Philipp Mueller over 2 years ago

did all the steps, virt-who still says there are unfinished hypervisor_update jobs

2017-10-06 12:02:39,618 [virtwho.destination_-4231981242631048948 DEBUG] MainProcess(14173):Thread-3 @subscriptionmanager.py:check_report_state:233 - Checking status of job hypervisor_update_189d8b51-eb61-4a8d-85d2-e551a408b27e
2017-10-06 12:02:39,673 [rhsm.connection DEBUG] MainProcess(14173):Thread-3 @connection.py:_request:602 - Response: status=200
2017-10-06 12:02:39,673 [virtwho.destination_-4231981242631048948 DEBUG] MainProcess(14173):Thread-3 @subscriptionmanager.py:check_report_state:247 - Job hypervisor_update_189d8b51-eb61-4a8d-85d2-e551a408b27e not finished

#16 Updated by Michael Stead over 2 years ago

Did you capture the candlepin log while virtwho did its checkin? I'm assuming that you did only one manual start of virt-who? Need to make sure that it eventually finishes.

Please provide:
1) /var/log/candlepin/candlepin.log
2) Current output of:

select id, state from cp_job where id like 'hypervisor_update%';

NOTE: With a large virt environment, it may take a while for candlepin to finish processing the host/guest updates.

#17 Updated by Philipp Mueller over 2 years ago

Michael Stead wrote:

Did you capture the candlepin log while virtwho did its checkin? I'm assuming that you did only one manual start of virt-who? Need to make sure that it eventually finishes.

Please provide:
1) /var/log/candlepin/candlepin.log
2) Current output of:
[...]

NOTE: With a large virt environment, it may take a while for candlepin to finish processing the host/guest updates.

i started virt-who as service, it produces the same behaviour as before settings the jobs to canceled

the candlepin.log doenst show any difference, even with debug log:

{"id":"hypervisor_update_8daa745d-0925-4ab8-90d2-9ec7bdb78996","state":"CREATED","startTime":null,"finishTime":null,"result":null,"principalName":"foreman_admin","targetType":"owner","targetId":"rs","ownerId":"rs","correlationId":null,"resultData":null,"statusPath":"/jobs/hypervisor_update_8daa745d-0925-4ab8-90d2-9ec7bdb78996","done":false,"group":"async group","created":"2017-10-11T07:34:14+0000","updated":"2017-10-11T07:34:14+0000"}

The hypervisor job is stuck.

#18 Updated by Philipp Mueller over 2 years ago

candlepin=# select id, state from cp_job where id like 'hypervisor_update%';
id | state
--------------------------------------------------------+-------
hypervisor_update_d43a1813-a807-41f2-ba73-693a77118f84 | 4
hypervisor_update_1a052e72-2959-4cd6-b4d3-4859829361e0 | 4
hypervisor_update_78bdad15-caea-409b-87d3-b677615d23de | 4
hypervisor_update_5c0a1380-6cc0-41b5-aec5-1bb3608cb9e0 | 4
hypervisor_update_cdee6944-31eb-4e2b-a57e-22647732db38 | 0
hypervisor_update_8daa745d-0925-4ab8-90d2-9ec7bdb78996 | 0
hypervisor_update_189d8b51-eb61-4a8d-85d2-e551a408b27e | 4

#19 Updated by Michael Stead over 2 years ago

I'm looking for the full candlepin log to see if I can spot any errors during the initial checkins.

Also, please provide the output from:

select id, result from cp_job where id like 'hypervisor_update%';

#20 Updated by Philipp Mueller over 2 years ago

Michael Stead wrote:

I'm looking for the full candlepin log to see if I can spot any errors during the initial checkins.

Also, please provide the output from:
[...]

here the candlepin.log with debug enabled
https://pm92.de/nc/index.php/s/2z6V2A0Bi2EHZTk

output of query:
candlepin=# select id, result from cp_job where id like 'hypervisor_update%';
id | result
--------------------------------------------------------+--------
hypervisor_update_d43a1813-a807-41f2-ba73-693a77118f84 |
hypervisor_update_1a052e72-2959-4cd6-b4d3-4859829361e0 |
hypervisor_update_78bdad15-caea-409b-87d3-b677615d23de |
hypervisor_update_5c0a1380-6cc0-41b5-aec5-1bb3608cb9e0 |
hypervisor_update_cdee6944-31eb-4e2b-a57e-22647732db38 |
hypervisor_update_8daa745d-0925-4ab8-90d2-9ec7bdb78996 |
hypervisor_update_9ec361ad-755a-4509-8e4f-066e0f917bfd |
hypervisor_update_34d28a4e-8c94-4f86-9700-1ddbbcddcf86 |
hypervisor_update_189d8b51-eb61-4a8d-85d2-e551a408b27e |

Also i have this in foreman task log:
Exception:
NoMethodError: undefined method `[]' for nil:NilClass
Backtrace:
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.4.5/app/lib/actions/katello/host/hypervisors.rb:23:in `block in parse_hypervisors'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.4.5/app/lib/actions/katello/host/hypervisors.rb:22:in `each'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.4.5/app/lib/actions/katello/host/hypervisors.rb:22:in `parse_hypervisors'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.4.5/app/lib/actions/candlepin/async_hypervisors.rb:12:in `poll_external_task'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/action/polling.rb:98:in `poll_external_task_with_rescue'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/action/polling.rb:21:in `run'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.4.5/app/lib/actions/candlepin/abstract_async_task.rb:9:in `run'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/action.rb:512:in `block (3 levels) in execute_run'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/middleware/stack.rb:26:in `call'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/middleware/stack.rb:26:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/middleware.rb:17:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/middleware.rb:30:in `run'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/middleware/stack.rb:22:in `call'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/middleware/stack.rb:26:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/middleware.rb:17:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.4.5/app/lib/actions/middleware/propagate_candlepin_errors.rb:9:in `block in run'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.4.5/app/lib/actions/middleware/propagate_candlepin_errors.rb:19:in `propagate_candlepin_errors'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.4.5/app/lib/actions/middleware/propagate_candlepin_errors.rb:9:in `run'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/middleware/stack.rb:22:in `call'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/middleware/stack.rb:26:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/middleware.rb:17:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.4.5/app/lib/actions/middleware/remote_action.rb:16:in `block in run'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.4.5/app/lib/actions/middleware/remote_action.rb:40:in `block in as_remote_user'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.4.5/app/models/katello/concerns/user_extensions.rb:21:in `cp_config'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.4.5/app/lib/actions/middleware/remote_action.rb:27:in `as_cp_user'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.4.5/app/lib/actions/middleware/remote_action.rb:39:in `as_remote_user'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.4.5/app/lib/actions/middleware/remote_action.rb:16:in `run'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/middleware/stack.rb:22:in `call'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/middleware/stack.rb:26:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/middleware.rb:17:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/action/progress.rb:30:in `with_progress_calculation'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/action/progress.rb:16:in `run'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/middleware/stack.rb:22:in `call'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/middleware/stack.rb:26:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/middleware.rb:17:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.4.5/app/lib/actions/middleware/keep_locale.rb:11:in `block in run'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.4.5/app/lib/actions/middleware/keep_locale.rb:22:in `with_locale'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.4.5/app/lib/actions/middleware/keep_locale.rb:11:in `run'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/middleware/stack.rb:22:in `call'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/middleware/stack.rb:26:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/middleware.rb:17:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/middleware.rb:30:in `run'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/middleware/stack.rb:22:in `call'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/middleware/world.rb:30:in `execute'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/action.rb:511:in `block (2 levels) in execute_run'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/action.rb:510:in `catch'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/action.rb:510:in `block in execute_run'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/action.rb:425:in `call'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/action.rb:425:in `block in with_error_handling'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/action.rb:425:in `catch'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/action.rb:425:in `with_error_handling'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/action.rb:505:in `execute_run'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/action.rb:266:in `execute'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/execution_plan/steps/abstract_flow_step.rb:9:in `block (2 levels) in execute'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/execution_plan/steps/abstract.rb:155:in `call'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/execution_plan/steps/abstract.rb:155:in `with_meta_calculation'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/execution_plan/steps/abstract_flow_step.rb:8:in `block in execute'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/execution_plan/steps/abstract_flow_step.rb:22:in `open_action'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/execution_plan/steps/abstract_flow_step.rb:7:in `execute'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/director.rb:55:in `execute'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/executors/parallel/worker.rb:11:in `on_message'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.3/lib/concurrent/actor/context.rb:46:in `on_envelope'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.3/lib/concurrent/actor/behaviour/executes_context.rb:7:in `on_envelope'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.3/lib/concurrent/actor/behaviour/abstract.rb:25:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-0.8.24/lib/dynflow/actor.rb:26:in `on_envelope'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.3/lib/concurrent/actor/behaviour/abstract.rb:25:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.3/lib/concurrent/actor/behaviour/awaits.rb:15:in `on_envelope'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.3/lib/concurrent/actor/behaviour/abstract.rb:25:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.3/lib/concurrent/actor/behaviour/sets_results.rb:14:in `on_envelope'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.3/lib/concurrent/actor/behaviour/abstract.rb:25:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.3/lib/concurrent/actor/behaviour/buffer.rb:38:in `process_envelope'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.3/lib/concurrent/actor/behaviour/buffer.rb:31:in `process_envelopes?'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.3/lib/concurrent/actor/behaviour/buffer.rb:20:in `on_envelope'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.3/lib/concurrent/actor/behaviour/abstract.rb:25:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.3/lib/concurrent/actor/behaviour/termination.rb:55:in `on_envelope'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.3/lib/concurrent/actor/behaviour/abstract.rb:25:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.3/lib/concurrent/actor/behaviour/removes_child.rb:10:in `on_envelope'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.3/lib/concurrent/actor/behaviour/abstract.rb:25:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.3/lib/concurrent/actor/behaviour/sets_results.rb:14:in `on_envelope'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.3/lib/concurrent/actor/core.rb:161:in `process_envelope'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.3/lib/concurrent/actor/core.rb:95:in `block in on_envelope'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.3/lib/concurrent/actor/core.rb:118:in `block (2 levels) in schedule_execution'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-1.0.3/lib/concurrent/synchronization/mri_lockable_object.rb:38:in `block in synchronize'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-1.0.3/lib/concurrent/synchronization/mri_lockable_object.rb:38:in `synchronize'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-1.0.3/lib/concurrent/synchronization/mri_lockable_object.rb:38:in `synchronize'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-edge-0.2.3/lib/concurrent/actor/core.rb:115:in `block in schedule_execution'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-1.0.3/lib/concurrent/executor/serialized_execution.rb:18:in `call'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-1.0.3/lib/concurrent/executor/serialized_execution.rb:18:in `call'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-1.0.3/lib/concurrent/executor/serialized_execution.rb:96:in `work'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-1.0.3/lib/concurrent/executor/serialized_execution.rb:77:in `block in call_job'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-1.0.3/lib/concurrent/executor/ruby_thread_pool_executor.rb:348:in `call'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-1.0.3/lib/concurrent/executor/ruby_thread_pool_executor.rb:348:in `run_task'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-1.0.3/lib/concurrent/executor/ruby_thread_pool_executor.rb:337:in `block (3 levels) in create_worker'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-1.0.3/lib/concurrent/executor/ruby_thread_pool_executor.rb:320:in `loop'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-1.0.3/lib/concurrent/executor/ruby_thread_pool_executor.rb:320:in `block (2 levels) in create_worker'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-1.0.3/lib/concurrent/executor/ruby_thread_pool_executor.rb:319:in `catch'
/opt/theforeman/tfm/root/usr/share/gems/gems/concurrent-ruby-1.0.3/lib/concurrent/executor/ruby_thread_pool_executor.rb:319:in `block in create_worker'
/opt/theforeman/tfm/root/usr/share/gems/gems/logging-1.8.2/lib/logging/diagnostic_context.rb:323:in `call'
/opt/theforeman/tfm/root/usr/share/gems/gems/logging-1.8.2/lib/logging/diagnostic_context.rb:323:in `block in create_with_logging_context'

#21 Updated by Justin Sherrill over 2 years ago

  • Has duplicate Bug #21191: Error update VM Information with virt-who added

#22 Updated by Ian Ginn over 2 years ago

I'm also having this issue. The Hypervisors job is stuck at 17%. I killed all of these running jobs manually then started it again by resetting the virt-who service but the issue remains. This is the output from the Raw tab of the task.

Id: 5d37625e-01df-4dc9-8195-aed95d3c1915
Label: Actions::Katello::Host::Hypervisors
Duration: about 2 hours
Raw input: {"services_checked"=>["candlepin", "candlepin_auth"],
"hypervisors"=>Step(3).output[:hypervisors]}
Raw output: {}
External Id: 61751a58-3abc-445c-911e-7a0ddd74b04a

#23 Updated by Kart Nico over 2 years ago

Hi,

I have the same problem on my Production environment > Katello 3.4.4 and also after update on Katello 3.4.5.

On my QA environment (only 2 RHEL registered) it works. > Katello 3.4.5

Regards,

Nicolas

#24 Updated by Kart Nico over 2 years ago

Kart Nico wrote:

Hi,

I have the same problem on my Production environment > Katello 3.4.4 and also after update on Katello 3.4.5.

On my QA environment (only 2 RHEL registered) it works. > Katello 3.4.5

Regards,

Nicolas

My products versions :

CentOS Linux release 7.3.1611 (Core)
virt-who 0.17-11.el7_3
katello 3.4.5-1.el7
candlepin 2.0.40-1.el7

The Hypervisors job is stuck also at 17%.

Regards,

Nicolas

#25 Updated by Vritant Jain over 2 years ago

Kart, any chance I can get a hand on a copy of your database? would help me to investigate further and verify any fix.

Kart Nico wrote:

Kart Nico wrote:

Hi,

I have the same problem on my Production environment > Katello 3.4.4 and also after update on Katello 3.4.5.

On my QA environment (only 2 RHEL registered) it works. > Katello 3.4.5

Regards,

Nicolas

My products versions :

CentOS Linux release 7.3.1611 (Core)
virt-who 0.17-11.el7_3
katello 3.4.5-1.el7
candlepin 2.0.40-1.el7

The Hypervisors job is stuck also at 17%.

Regards,

Nicolas

#26 Updated by Vritant Jain over 2 years ago

Ian, any chance I can get a hand on a copy of your candlepin database? would help me to investigate further and verify any fix.

Ian Ginn wrote:

I'm also having this issue. The Hypervisors job is stuck at 17%. I killed all of these running jobs manually then started it again by resetting the virt-who service but the issue remains. This is the output from the Raw tab of the task.

Id: 5d37625e-01df-4dc9-8195-aed95d3c1915
Label: Actions::Katello::Host::Hypervisors
Duration: about 2 hours
Raw input: {"services_checked"=>["candlepin", "candlepin_auth"],
"hypervisors"=>Step(3).output[:hypervisors]}
Raw output: {}
External Id: 61751a58-3abc-445c-911e-7a0ddd74b04a

#27 Updated by Kart Nico over 2 years ago

Hi,

In which format do you want the export ?

pg_dump candlepin | gzip -c > /var/backup/db

Is OK for you ?

Regards,

Nicolas

Vritant Jain wrote:

Kart, any chance I can get a hand on a copy of your database? would help me to investigate further and verify any fix.

Kart Nico wrote:

Kart Nico wrote:

Hi,

I have the same problem on my Production environment > Katello 3.4.4 and also after update on Katello 3.4.5.

On my QA environment (only 2 RHEL registered) it works. > Katello 3.4.5

Regards,

Nicolas

My products versions :

CentOS Linux release 7.3.1611 (Core)
virt-who 0.17-11.el7_3
katello 3.4.5-1.el7
candlepin 2.0.40-1.el7

The Hypervisors job is stuck also at 17%.

Regards,

Nicolas

#28 Updated by Vritant Jain over 2 years ago

Kart,
a pg_dump is perfect.

Kart Nico wrote:

Hi,

In which format do you want the export ?

pg_dump candlepin | gzip -c > /var/backup/db

Is OK for you ?

Regards,

Nicolas

Vritant Jain wrote:

Kart, any chance I can get a hand on a copy of your database? would help me to investigate further and verify any fix.

Kart Nico wrote:

Kart Nico wrote:

Hi,

I have the same problem on my Production environment > Katello 3.4.4 and also after update on Katello 3.4.5.

On my QA environment (only 2 RHEL registered) it works. > Katello 3.4.5

Regards,

Nicolas

My products versions :

CentOS Linux release 7.3.1611 (Core)
virt-who 0.17-11.el7_3
katello 3.4.5-1.el7
candlepin 2.0.40-1.el7

The Hypervisors job is stuck also at 17%.

Regards,

Nicolas

#29 Updated by Kart Nico over 2 years ago

Hi,

I sent you all the database in private message. If you want, you can publish only generic data ;).

Regards,

Nicolas

Vritant Jain wrote:

Kart,
a pg_dump is perfect.

Kart Nico wrote:

Hi,

In which format do you want the export ?

pg_dump candlepin | gzip -c > /var/backup/db

Is OK for you ?

Regards,

Nicolas

Vritant Jain wrote:

Kart, any chance I can get a hand on a copy of your database? would help me to investigate further and verify any fix.

Kart Nico wrote:

Kart Nico wrote:

Hi,

I have the same problem on my Production environment > Katello 3.4.4 and also after update on Katello 3.4.5.

On my QA environment (only 2 RHEL registered) it works. > Katello 3.4.5

Regards,

Nicolas

My products versions :

CentOS Linux release 7.3.1611 (Core)
virt-who 0.17-11.el7_3
katello 3.4.5-1.el7
candlepin 2.0.40-1.el7

The Hypervisors job is stuck also at 17%.

Regards,

Nicolas

#30 Updated by Vritant Jain over 2 years ago

  • Bugzilla link set to 1510082

#31 Updated by Philipp Mueller over 2 years ago

Hello Jain,

our Foreman installation is also affected by this bug. Since the problem is becoming very urgent, i was wondering if you think deleting all hosts / subscriptions / hypervisors will fix this problem?

KR
Philipp

#32 Updated by Vritant Jain over 2 years ago

While we work on a fix, I have some sql to alleviate and unblock any candlepin effected by this issue. could you:

1. stop katello / candlepin services.
2. login as postgres user: su - postgres
3. psql candlepin -c " delete from qrtz_paused_trigger_grps;"
4. start katello / candlepin services.

NOTE: this will allow us to schedule future candlepin jobs, but the already created jobs will not successfully execute. any future virt-who reports should be successfully consumed.

Philipp Mueller wrote:

Hello Jain,

our Foreman installation is also affected by this bug. Since the problem is becoming very urgent, i was wondering if you think deleting all hosts / subscriptions / hypervisors will fix this problem?

KR
Philipp

#33 Updated by Philipp Mueller over 2 years ago

Vritant Jain wrote:

While we work on a fix, I have some sql to alleviate and unblock any candlepin effected by this issue. could you:

1. stop katello / candlepin services.
2. login as postgres user: su - postgres
3. psql candlepin -c " delete from qrtz_paused_trigger_grps;"
4. start katello / candlepin services.

NOTE: this will allow us to schedule future candlepin jobs, but the already created jobs will not successfully execute. any future virt-who reports should be successfully consumed.

Philipp Mueller wrote:

Hello Jain,

our Foreman installation is also affected by this bug. Since the problem is becoming very urgent, i was wondering if you think deleting all hosts / subscriptions / hypervisors will fix this problem?

KR
Philipp

Hi,

thanks for the reply. Unfortunately the delete statement didnt fix the problem.
3 rows got deleted. After that i restarted the katello-services and run virt-who -d -o

I still get the same errors as before:
Job hypervisor_update_8a4e519b-811d-4266-b287-3a862df8b1ea not finished

thank you.

#34 Updated by Vritant Jain over 2 years ago

Can you join Freenode #candlepin please?

Philipp Mueller wrote:

Vritant Jain wrote:

While we work on a fix, I have some sql to alleviate and unblock any candlepin effected by this issue. could you:

1. stop katello / candlepin services.
2. login as postgres user: su - postgres
3. psql candlepin -c " delete from qrtz_paused_trigger_grps;"
4. start katello / candlepin services.

NOTE: this will allow us to schedule future candlepin jobs, but the already created jobs will not successfully execute. any future virt-who reports should be successfully consumed.

Philipp Mueller wrote:

Hello Jain,

our Foreman installation is also affected by this bug. Since the problem is becoming very urgent, i was wondering if you think deleting all hosts / subscriptions / hypervisors will fix this problem?

KR
Philipp

Hi,

thanks for the reply. Unfortunately the delete statement didnt fix the problem.
3 rows got deleted. After that i restarted the katello-services and run virt-who -d -o

I still get the same errors as before:
Job hypervisor_update_8a4e519b-811d-4266-b287-3a862df8b1ea not finished

thank you.

#35 Updated by Philipp Mueller over 2 years ago

Philipp Mueller wrote:

Vritant Jain wrote:

While we work on a fix, I have some sql to alleviate and unblock any candlepin effected by this issue. could you:

1. stop katello / candlepin services.
2. login as postgres user: su - postgres
3. psql candlepin -c " delete from qrtz_paused_trigger_grps;"
4. start katello / candlepin services.

NOTE: this will allow us to schedule future candlepin jobs, but the already created jobs will not successfully execute. any future virt-who reports should be successfully consumed.

Philipp Mueller wrote:

Hello Jain,

our Foreman installation is also affected by this bug. Since the problem is becoming very urgent, i was wondering if you think deleting all hosts / subscriptions / hypervisors will fix this problem?

KR
Philipp

Hi,

thanks for the reply. Unfortunately the delete statement didnt fix the problem.
3 rows got deleted. After that i restarted the katello-services and run virt-who -d -o

I still get the same errors as before:
Job hypervisor_update_8a4e519b-811d-4266-b287-3a862df8b1ea not finished

thank you.

FIX IS WORKING!
It was my bad being to impatient. Thank you

#36 Updated by Kart Nico over 2 years ago

Could you share it please :)

I will try it on my environment.

Regards,

Nicolas

Philipp Mueller wrote:

Philipp Mueller wrote:

Vritant Jain wrote:

While we work on a fix, I have some sql to alleviate and unblock any candlepin effected by this issue. could you:

1. stop katello / candlepin services.
2. login as postgres user: su - postgres
3. psql candlepin -c " delete from qrtz_paused_trigger_grps;"
4. start katello / candlepin services.

NOTE: this will allow us to schedule future candlepin jobs, but the already created jobs will not successfully execute. any future virt-who reports should be successfully consumed.

Philipp Mueller wrote:

Hello Jain,

our Foreman installation is also affected by this bug. Since the problem is becoming very urgent, i was wondering if you think deleting all hosts / subscriptions / hypervisors will fix this problem?

KR
Philipp

Hi,

thanks for the reply. Unfortunately the delete statement didnt fix the problem.
3 rows got deleted. After that i restarted the katello-services and run virt-who -d -o

I still get the same errors as before:
Job hypervisor_update_8a4e519b-811d-4266-b287-3a862df8b1ea not finished

thank you.

FIX IS WORKING!
It was my bad being to impatient. Thank you

#37 Updated by Ian Ginn over 2 years ago

I just want to add that Vritants fix worked for me as well. My hosts last check in is today. Thanks for the fix!

#38 Updated by Eric Helms over 2 years ago

Vritant,

Given a fix has been working for users, can you let us know if we need to do anything to address this for users going forward? A fix on our side? Is there a Candlepin fix or build thats needed?

#39 Updated by Vritant Jain over 2 years ago

Eric,
This issue occurs when there are jobs scheduled but candlepin is shutdown / suspended.
I am currently investigating a fix for this issue so we do not get into this situation in the first place.
Eric Helms wrote:

Vritant,

Given a fix has been working for users, can you let us know if we need to do anything to address this for users going forward? A fix on our side? Is there a Candlepin fix or build thats needed?

#40 Updated by Philipp Mueller over 2 years ago

Hi,

after implementing the fix with the delete from qrtz_paused_trigger_grps, virt-who was working fine for some days, but now we are expieriencing the same problems again.

#41 Updated by Andrew Kofink over 2 years ago

  • Legacy Backlogs Release (now unused) set to 329
  • Assignee set to Justin Sherrill

This requires a new version of candlepin to be released and packaged with Katello.

#42 Updated by Justin Sherrill over 2 years ago

we need to pull in 2.1.9 or newer

#43 Updated by Justin Sherrill over 2 years ago

  • Target version set to 242

#44 Updated by Justin Sherrill over 2 years ago

candlepin-2.1.12-1.el7 tagged to 3.5.1

#45 Updated by Justin Sherrill over 2 years ago

  • Status changed from Need more information to Closed

Also available in: Atom PDF