Tracker #27408
closedDynflow workers extraction to separate processes
0%
Description
Currently, the orchestration (deciding what steps happen when) happens in the very same process as the actual work (the bodies of the dynflow actions).
This approach has several limitations:
1. not possibility to scale the workers in an effective way: it's possible to run multiple executor processes, but the tasks get assigned to those executors early after planning, and there is no way for sharing the work related to one task across multiple tasks. This way one executor can be busy, while the other would have nothing to do. It also means that running new executor while the current ones are busy does not help with dealing with the current queue of items to work on
2. the issues in the actions code (such as inefficient dealing with memory) can lead to need to restarting the whole process. When the workers are part of the whole executor, restating the whole process means additional risk of not being able to resume some work due to context lost during the restarts. Also, during the restart, no tasks can be processed
3. given both orchestration and actual work happen in the same process, there is limitation of using more CPU cores for the work due to Ruby GIL limitations. After this change, we should be able to run multiple worker processes to leverage full power of the hardware. It would also allow us to run a process dedicated to a specific queue (something we don't support at the moment)
The goal of this card would be to introduce a way how to run the workers outside of the main executor process to address the limitations above. We also want to do it by leveraging proven techniques within the Ruby community rather than coming up with some crafted solution. It looks like Sidekiq/redis has over the years become the de-factor standard for async data processing and so far we have not hit any issues when trying to leverage it for our purposes.
From the vocabulary perspective, the plan is to split the current executor
term to:
- orchestrator - the process that decides what runs when
- workers - the dummy processes able to run the actions, but not necessary do anything else, therefore they should stay mostly stateless
Updated by Ivan Necas over 5 years ago
- Blocked by Feature #27409: Initial support for running dynflow work on separate workers using Sidekiq added
Updated by Ivan Necas over 5 years ago
- Blocked by Refactor #27410: Avoid using internal world.clock within workers added
Updated by Ivan Necas over 5 years ago
- Blocked by Refactor #27411: Abnormal states recovery with workers in separate process added
Updated by Ivan Necas over 5 years ago
- Blocked by Refactor #27412: Ensure not multiple orchestrators can run at once added
Updated by Ivan Necas over 5 years ago
- Blocked by Feature #27413: Expose Sidekiq console via foreman-tasks + authentication added
Updated by Ivan Necas over 5 years ago
- Related to Refactor #27415: installer/packaging support for configuring sidekiq in a required way added
Updated by Ivan Necas over 5 years ago
- Blocks Refactor #27421: Accommodate queues configuration for sidekiq added
Updated by Ivan Necas over 5 years ago
- Blocks Refactor #27422: Accomodate the dynflow status page to the sidekiq backend added
Updated by Ivan Necas over 5 years ago
- Blocks deleted (Refactor #27421: Accommodate queues configuration for sidekiq)
Updated by Ivan Necas over 5 years ago
- Blocks deleted (Refactor #27422: Accomodate the dynflow status page to the sidekiq backend)
Updated by Adam Ruzicka over 5 years ago
- Blocked by Refactor #27633: Abnormal states recovery with workers in separate process - orchestrator added