Workers go missing under heavy load
Cloned from BZ:
I just merged a patch to fix this issue upstream:
This patch adds a config variable in the tasks section of /etc/pulp/server.conf called 'worker_timeout' that sets the maximum time a worker will run without checking in before it's killed. It also adds some warnings that will get raised before this point to indicate that heartbeats are taking too long.
The one thing I think Katello/Satellite should do is raise the worker_timeout setting. Since installations typically run multiple apps/dbs/processes, it'll probably need a higher timeout than just Pulp alone. The default is 30. I'd probably recommend at least 60. If you plan to support mongoDB running on spinning disks (probably not a good idea) then I'd go with 300.
Let me know if you have any questions.
#4 Updated by Chris Roberts over 2 years ago
- % Done changed from 0 to 100
- Status changed from Ready For Testing to Closed
Applied in changeset puppet-pulp|84c0033586861c5f165da9004c1e3e24d0165908.