Bug #17498
closedImprove ListenOnCandlepinEvents throughput
Description
Main loop of the task has redundant "sleep 1", see (current) https://github.com/Katello/katello/blob/master/app/lib/actions/candlepin/candlepin_listening_service.rb#L86 . That means, the task can process at most 1 candlepin event per second.
This could be limiting factor when a bigger burst of events come, as well as in situations when katello_event_queue has thousands of messages of backlog and processing it takes ridiculously huge time in hours (this is something I see in field relatively often - ListenOnCandlepinEvents task is stopped/paused due to whatever issue, and fixing it means several hours of processing its backlog - only due to the "sleep 1" after each and every message processing).
I have successfully tested removal of the sleep by having >20k messages in the queue and restarting foreman-tasks service. Since the time this task subscribed to the queue, the 21k messages were consumed within 35 seconds with the only impact of dynflow_executor consuming high CPU.
Therefore I see no reason for having the sleep there (that is present there rather due to historical reasons, as far as I understood).
Updated by Pavel Moravec over 8 years ago
- Status changed from Ready For Testing to Closed
- % Done changed from 0 to 100
Applied in changeset katello|55677460eea47e2d34a04364b485daaf59296cdc.
Updated by Mike McCune over 8 years ago
- Status changed from Closed to Assigned
This actually causes foreman-tasks to die and stop processing. We need to improve this code, but removing it causes more issues than it solves
# hammer ping foreman_tasks: Status: FAIL
putting the sleep back in and it goes back to normal status
Updated by Pavel Moravec over 8 years ago
- Pull request deleted (
https://github.com/Katello/katello/pull/6477)
Reasoning of the regression: removing the sleep, the thread was in a tight loop, suppressing potential activity of other dynflow threads that starved on CPU. Like hammer ping.
Code proposal:
loop do
begin
message = fetch_message
if message[:result]
result = message[:result]
@session.acknowledge(:message => result, :sync => true)
suspended_action.notify_message_received(result.message_id, result.subject, result.content)
elsif message[:error]
suspended_action.notify_not_connected(message[:error])
break
else
sleep 1
end
sleep 0.01 # sleep 1 was originally here
rescue => e
suspended_action.notify_fatal(e)
raise e
end
end
I.e. call "sleep 1" only when no message was fetched from the queue, and have short sleep after fetching a real message - again to give other threads access to CPU.
I am not in favour of having only "sleep 0.01" without the "else sleep 1" option. That would mean katello periodically fetching qpid queue every 0.01 second, what is meaningless and redundantly takes resources.
Another alternative patch would be decreasing original "sleep 1" to "sleep 0.1" as a "compromise", but that would IMHO still redundantly fetch katello queue too often. Requirements to the two delays are contradicting each other, it doesnt make much sense to ave just one delay.
Updated by Justin Sherrill over 8 years ago
- Status changed from Ready For Testing to Closed
Applied in changeset katello|87ef553cfe840299389abbb4123a640c2227e0bb.