Bug #22338
closedWorkers go missing under heavy load
Description
Cloned from BZ:
I just merged a patch to fix this issue upstream:
https://github.com/pulp/pulp/pull/3245
This patch adds a config variable in the tasks section of /etc/pulp/server.conf called 'worker_timeout' that sets the maximum time a worker will run without checking in before it's killed. It also adds some warnings that will get raised before this point to indicate that heartbeats are taking too long.
The one thing I think Katello/Satellite should do is raise the worker_timeout setting. Since installations typically run multiple apps/dbs/processes, it'll probably need a higher timeout than just Pulp alone. The default is 30. I'd probably recommend at least 60. If you plan to support mongoDB running on spinning disks (probably not a good idea) then I'd go with 300.
Let me know if you have any questions.