Bug #17498: Improve ListenOnCandlepinEvents throughput - Katello - Foreman

Actions

Copy link

Bug #17498

closed

Improve ListenOnCandlepinEvents throughput

Added by Pavel Moravec over 8 years ago. Updated almost 7 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Pavel Moravec

Category:

Target version:

Katello 3.3.0

Difficulty:

trivial

Triaged:

Bugzilla link:

1399877

Pull request:

https://github.com/Katello/katello/pull/6472, https://github.com/Katello/katello/pull/6477

Fixed in Releases:

Found in Releases:

Red Hat JIRA:

Description

Main loop of the task has redundant "sleep 1", see (current) https://github.com/Katello/katello/blob/master/app/lib/actions/candlepin/candlepin_listening_service.rb#L86 . That means, the task can process at most 1 candlepin event per second.

This could be limiting factor when a bigger burst of events come, as well as in situations when katello_event_queue has thousands of messages of backlog and processing it takes ridiculously huge time in hours (this is something I see in field relatively often - ListenOnCandlepinEvents task is stopped/paused due to whatever issue, and fixing it means several hours of processing its backlog - only due to the "sleep 1" after each and every message processing).

I have successfully tested removal of the sleep by having >20k messages in the queue and restarting foreman-tasks service. Since the time this task subscribed to the queue, the 21k messages were consumed within 35 seconds with the only impact of dynflow_executor consuming high CPU.

Therefore I see no reason for having the sleep there (that is present there rather due to historical reasons, as far as I understood).

Actions

Copy link

Updated by The Foreman Bot over 8 years ago

Status changed from New to Ready For Testing
Pull request https://github.com/Katello/katello/pull/6472 added

Actions

Copy link

Updated by Pavel Moravec over 8 years ago

Status changed from Ready For Testing to Closed
% Done changed from 0 to 100

Applied in changeset katello|55677460eea47e2d34a04364b485daaf59296cdc.

Actions

Copy link

Updated by Mike McCune over 8 years ago

Bugzilla link set to 1399877

Actions

Copy link

Updated by Eric Helms over 8 years ago

Translation missing: en.field_release set to 208

Actions

Copy link

Updated by Eric Helms over 8 years ago

Translation missing: en.field_release changed from 208 to 188

Actions

Copy link

Updated by Mike McCune over 8 years ago

Status changed from Closed to Assigned

This actually causes foreman-tasks to die and stop processing. We need to improve this code, but removing it causes more issues than it solves

# hammer ping
foreman_tasks:  
     Status:          FAIL

putting the sleep back in and it goes back to normal status

Actions

Copy link

Updated by The Foreman Bot over 8 years ago

Status changed from Assigned to Ready For Testing
Pull request https://github.com/Katello/katello/pull/6477 added

Actions

Copy link

Updated by Pavel Moravec over 8 years ago

Pull request deleted (~~https://github.com/Katello/katello/pull/6477~~)

Reasoning of the regression: removing the sleep, the thread was in a tight loop, suppressing potential activity of other dynflow threads that starved on CPU. Like hammer ping.

Code proposal:

          loop do
            begin
              message = fetch_message
              if message[:result]
                result = message[:result]
                @session.acknowledge(:message => result, :sync => true)
                suspended_action.notify_message_received(result.message_id, result.subject, result.content)
              elsif message[:error]
                suspended_action.notify_not_connected(message[:error])
                break
              else
                sleep 1
              end
              sleep 0.01 # sleep 1 was originally here
            rescue => e
              suspended_action.notify_fatal(e)
              raise e
            end
          end

I.e. call "sleep 1" only when no message was fetched from the queue, and have short sleep after fetching a real message - again to give other threads access to CPU.

I am not in favour of having only "sleep 0.01" without the "else sleep 1" option. That would mean katello periodically fetching qpid queue every 0.01 second, what is meaningless and redundantly takes resources.

Another alternative patch would be decreasing original "sleep 1" to "sleep 0.1" as a "compromise", but that would IMHO still redundantly fetch katello queue too often. Requirements to the two delays are contradicting each other, it doesnt make much sense to ave just one delay.

Actions

Copy link