Bug #17498
closedImprove ListenOnCandlepinEvents throughput
Description
Main loop of the task has redundant "sleep 1", see (current) https://github.com/Katello/katello/blob/master/app/lib/actions/candlepin/candlepin_listening_service.rb#L86 . That means, the task can process at most 1 candlepin event per second.
This could be limiting factor when a bigger burst of events come, as well as in situations when katello_event_queue has thousands of messages of backlog and processing it takes ridiculously huge time in hours (this is something I see in field relatively often - ListenOnCandlepinEvents task is stopped/paused due to whatever issue, and fixing it means several hours of processing its backlog - only due to the "sleep 1" after each and every message processing).
I have successfully tested removal of the sleep by having >20k messages in the queue and restarting foreman-tasks service. Since the time this task subscribed to the queue, the 21k messages were consumed within 35 seconds with the only impact of dynflow_executor consuming high CPU.
Therefore I see no reason for having the sleep there (that is present there rather due to historical reasons, as far as I understood).
Updated by The Foreman Bot about 8 years ago
- Status changed from New to Ready For Testing
- Pull request https://github.com/Katello/katello/pull/6472 added
Updated by Pavel Moravec about 8 years ago
- Status changed from Ready For Testing to Closed
- % Done changed from 0 to 100
Applied in changeset katello|55677460eea47e2d34a04364b485daaf59296cdc.
Updated by Eric Helms about 8 years ago
- Translation missing: en.field_release set to 208
Updated by Eric Helms about 8 years ago
- Translation missing: en.field_release changed from 208 to 188
Updated by Mike McCune about 8 years ago
- Status changed from Closed to Assigned
This actually causes foreman-tasks to die and stop processing. We need to improve this code, but removing it causes more issues than it solves
# hammer ping foreman_tasks: Status: FAIL
putting the sleep back in and it goes back to normal status
Updated by The Foreman Bot about 8 years ago
- Status changed from Assigned to Ready For Testing
- Pull request https://github.com/Katello/katello/pull/6477 added
Updated by Pavel Moravec about 8 years ago
- Pull request deleted (
https://github.com/Katello/katello/pull/6477)
Reasoning of the regression: removing the sleep, the thread was in a tight loop, suppressing potential activity of other dynflow threads that starved on CPU. Like hammer ping.
Code proposal:
loop do
begin
message = fetch_message
if message[:result]
result = message[:result]
@session.acknowledge(:message => result, :sync => true)
suspended_action.notify_message_received(result.message_id, result.subject, result.content)
elsif message[:error]
suspended_action.notify_not_connected(message[:error])
break
else
sleep 1
end
sleep 0.01 # sleep 1 was originally here
rescue => e
suspended_action.notify_fatal(e)
raise e
end
end
I.e. call "sleep 1" only when no message was fetched from the queue, and have short sleep after fetching a real message - again to give other threads access to CPU.
I am not in favour of having only "sleep 0.01" without the "else sleep 1" option. That would mean katello periodically fetching qpid queue every 0.01 second, what is meaningless and redundantly takes resources.
Another alternative patch would be decreasing original "sleep 1" to "sleep 0.1" as a "compromise", but that would IMHO still redundantly fetch katello queue too often. Requirements to the two delays are contradicting each other, it doesnt make much sense to ave just one delay.
Updated by The Foreman Bot about 8 years ago
- Pull request https://github.com/Katello/katello/pull/6477 added
Updated by Justin Sherrill about 8 years ago
- Status changed from Ready For Testing to Closed
Applied in changeset katello|87ef553cfe840299389abbb4123a640c2227e0bb.