Project

General

Profile

Bug #17498

Improve ListenOnCandlepinEvents throughput

Added by Pavel Moravec over 4 years ago. Updated almost 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
Difficulty:
trivial
Triaged:
Bugzilla link:

Description

Main loop of the task has redundant "sleep 1", see (current) https://github.com/Katello/katello/blob/master/app/lib/actions/candlepin/candlepin_listening_service.rb#L86 . That means, the task can process at most 1 candlepin event per second.

This could be limiting factor when a bigger burst of events come, as well as in situations when katello_event_queue has thousands of messages of backlog and processing it takes ridiculously huge time in hours (this is something I see in field relatively often - ListenOnCandlepinEvents task is stopped/paused due to whatever issue, and fixing it means several hours of processing its backlog - only due to the "sleep 1" after each and every message processing).

I have successfully tested removal of the sleep by having >20k messages in the queue and restarting foreman-tasks service. Since the time this task subscribed to the queue, the 21k messages were consumed within 35 seconds with the only impact of dynflow_executor consuming high CPU.

Therefore I see no reason for having the sleep there (that is present there rather due to historical reasons, as far as I understood).

Associated revisions

Revision 55677460 (diff)
Added by Pavel Moravec over 4 years ago

Fixes #17498 - Improve ListenOnCandlepinEvents throughput

Remove "sleep 1" from the loop of processing messages from
katello_event_queue.

Signed-off-by: Pavel Moravec <>

Revision 87ef553c (diff)
Added by Justin Sherrill over 4 years ago

Fixes #17498 - improve ListenOnCandlepinEvents throughput

while still respecting other tasks. Without this
the ping will never properly detect that dynflow is running

History

#1 Updated by The Foreman Bot over 4 years ago

  • Status changed from New to Ready For Testing
  • Pull request https://github.com/Katello/katello/pull/6472 added

#2 Updated by Pavel Moravec over 4 years ago

  • Status changed from Ready For Testing to Closed
  • % Done changed from 0 to 100

#3 Updated by Mike McCune over 4 years ago

  • Bugzilla link set to 1399877

#4 Updated by Eric Helms over 4 years ago

  • Legacy Backlogs Release (now unused) set to 208

#5 Updated by Eric Helms over 4 years ago

  • Legacy Backlogs Release (now unused) changed from 208 to 188

#6 Updated by Mike McCune over 4 years ago

  • Status changed from Closed to Assigned

This actually causes foreman-tasks to die and stop processing. We need to improve this code, but removing it causes more issues than it solves

# hammer ping
foreman_tasks:  
     Status:          FAIL

putting the sleep back in and it goes back to normal status

#7 Updated by The Foreman Bot over 4 years ago

  • Status changed from Assigned to Ready For Testing
  • Pull request https://github.com/Katello/katello/pull/6477 added

#8 Updated by Pavel Moravec over 4 years ago

  • Pull request deleted (https://github.com/Katello/katello/pull/6477)

Reasoning of the regression: removing the sleep, the thread was in a tight loop, suppressing potential activity of other dynflow threads that starved on CPU. Like hammer ping.

Code proposal:

          loop do
            begin
              message = fetch_message
              if message[:result]
                result = message[:result]
                @session.acknowledge(:message => result, :sync => true)
                suspended_action.notify_message_received(result.message_id, result.subject, result.content)
              elsif message[:error]
                suspended_action.notify_not_connected(message[:error])
                break
              else
                sleep 1
              end
              sleep 0.01 # sleep 1 was originally here
            rescue => e
              suspended_action.notify_fatal(e)
              raise e
            end
          end

I.e. call "sleep 1" only when no message was fetched from the queue, and have short sleep after fetching a real message - again to give other threads access to CPU.

I am not in favour of having only "sleep 0.01" without the "else sleep 1" option. That would mean katello periodically fetching qpid queue every 0.01 second, what is meaningless and redundantly takes resources.

Another alternative patch would be decreasing original "sleep 1" to "sleep 0.1" as a "compromise", but that would IMHO still redundantly fetch katello queue too often. Requirements to the two delays are contradicting each other, it doesnt make much sense to ave just one delay.

#9 Updated by The Foreman Bot over 4 years ago

  • Pull request https://github.com/Katello/katello/pull/6477 added

#10 Updated by Justin Sherrill over 4 years ago

  • Status changed from Ready For Testing to Closed

#11 Updated by Brad Buckingham over 4 years ago

  • Target version set to 147

Also available in: Atom PDF