Project

General

Profile

Bug #16543

Updated by Chris Duryee over 7 years ago

If you suspend the qpid process (simulating high qpid load) the candlepin event listener process will hang in its event loop. This causes the dynflow executor to not proceed. 

 For example: 

 * on tasks page, view candlepin task process, it should say something like: 

 <pre> 
 {"messages"=>"b190e333-7821-302a-8131-9693e66e2144", 
  "last_message"=>"b190e333-7821-302a-8131-9693e66e2144 - import.created", 
  "error"=>nil, 
  "connection"=>"Connected"} 
 </pre> 

 * now, freeze the qpidd process: kill -19 `pidof qpidd`. Note that the candlepin event listener still thinks its connected. 

 * do a "hammer ping", it will hang due to https://pulp.plan.io/issues/2253. 

 * do a "foreman-rake console" and run Katello::Ping.ping(services: [:foreman_tasks]). Note that the executor failed to respond. 

 Once qpidd is unsuspended via kill -18, things will run normally again. 

 note: this is the call that hangs: https://github.com/Katello/katello/blob/master/app/lib/actions/candlepin/candlepin_listening_service.rb#L42. The timeout appears to not be honored. I suspect that if qpidd was unresponsive for long enough, the kernel would eventually realize to sever the TCP connection. I have not tested freezing it for over 20 min, but it could take a couple of hours for this to happen. I suspect that the listener would not create a new connection at this point, and if qpid came back, events would start piling up on the katello_event_queue. http://john.eckersberg.com/improving-ha-failures-with-tcp-timeouts.html may provide additional clues about this.

Back