Project

General

Profile

Actions

Bug #17498

closed

Improve ListenOnCandlepinEvents throughput

Added by Pavel Moravec over 7 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
Fixed in Releases:
Found in Releases:

Description

Main loop of the task has redundant "sleep 1", see (current) https://github.com/Katello/katello/blob/master/app/lib/actions/candlepin/candlepin_listening_service.rb#L86 . That means, the task can process at most 1 candlepin event per second.

This could be limiting factor when a bigger burst of events come, as well as in situations when katello_event_queue has thousands of messages of backlog and processing it takes ridiculously huge time in hours (this is something I see in field relatively often - ListenOnCandlepinEvents task is stopped/paused due to whatever issue, and fixing it means several hours of processing its backlog - only due to the "sleep 1" after each and every message processing).

I have successfully tested removal of the sleep by having >20k messages in the queue and restarting foreman-tasks service. Since the time this task subscribed to the queue, the 21k messages were consumed within 35 seconds with the only impact of dynflow_executor consuming high CPU.

Therefore I see no reason for having the sleep there (that is present there rather due to historical reasons, as far as I understood).

Actions #1

Updated by The Foreman Bot over 7 years ago

  • Status changed from New to Ready For Testing
  • Pull request https://github.com/Katello/katello/pull/6472 added
Actions #2

Updated by Pavel Moravec over 7 years ago

  • Status changed from Ready For Testing to Closed
  • % Done changed from 0 to 100
Actions #3

Updated by Mike McCune over 7 years ago

  • Bugzilla link set to 1399877
Actions #4

Updated by Eric Helms over 7 years ago

  • translation missing: en.field_release set to 208
Actions #5

Updated by Eric Helms over 7 years ago

  • translation missing: en.field_release changed from 208 to 188
Actions #6

Updated by Mike McCune over 7 years ago

  • Status changed from Closed to Assigned

This actually causes foreman-tasks to die and stop processing. We need to improve this code, but removing it causes more issues than it solves

# hammer ping
foreman_tasks:  
     Status:          FAIL

putting the sleep back in and it goes back to normal status

Actions #7

Updated by The Foreman Bot over 7 years ago

  • Status changed from Assigned to Ready For Testing
  • Pull request https://github.com/Katello/katello/pull/6477 added
Actions #8

Updated by Pavel Moravec over 7 years ago

  • Pull request deleted (https://github.com/Katello/katello/pull/6477)

Reasoning of the regression: removing the sleep, the thread was in a tight loop, suppressing potential activity of other dynflow threads that starved on CPU. Like hammer ping.

Code proposal:

          loop do
            begin
              message = fetch_message
              if message[:result]
                result = message[:result]
                @session.acknowledge(:message => result, :sync => true)
                suspended_action.notify_message_received(result.message_id, result.subject, result.content)
              elsif message[:error]
                suspended_action.notify_not_connected(message[:error])
                break
              else
                sleep 1
              end
              sleep 0.01 # sleep 1 was originally here
            rescue => e
              suspended_action.notify_fatal(e)
              raise e
            end
          end

I.e. call "sleep 1" only when no message was fetched from the queue, and have short sleep after fetching a real message - again to give other threads access to CPU.

I am not in favour of having only "sleep 0.01" without the "else sleep 1" option. That would mean katello periodically fetching qpid queue every 0.01 second, what is meaningless and redundantly takes resources.

Another alternative patch would be decreasing original "sleep 1" to "sleep 0.1" as a "compromise", but that would IMHO still redundantly fetch katello queue too often. Requirements to the two delays are contradicting each other, it doesnt make much sense to ave just one delay.

Actions #9

Updated by The Foreman Bot over 7 years ago

  • Pull request https://github.com/Katello/katello/pull/6477 added
Actions #10

Updated by Justin Sherrill over 7 years ago

  • Status changed from Ready For Testing to Closed
Actions #11

Updated by Brad Buckingham over 7 years ago

  • Target version set to 147
Actions

Also available in: Atom PDF