Project

General

Profile

Actions

Tracker #27408

closed

Dynflow workers extraction to separate processes

Added by Ivan Necas over 5 years ago. Updated about 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
-
Target version:
% Done:

0%

Difficulty:
Triaged:
No
Fixed in Releases:
Found in Releases:

Description

Currently, the orchestration (deciding what steps happen when) happens in the very same process as the actual work (the bodies of the dynflow actions).

This approach has several limitations:

1. not possibility to scale the workers in an effective way: it's possible to run multiple executor processes, but the tasks get assigned to those executors early after planning, and there is no way for sharing the work related to one task across multiple tasks. This way one executor can be busy, while the other would have nothing to do. It also means that running new executor while the current ones are busy does not help with dealing with the current queue of items to work on
2. the issues in the actions code (such as inefficient dealing with memory) can lead to need to restarting the whole process. When the workers are part of the whole executor, restating the whole process means additional risk of not being able to resume some work due to context lost during the restarts. Also, during the restart, no tasks can be processed
3. given both orchestration and actual work happen in the same process, there is limitation of using more CPU cores for the work due to Ruby GIL limitations. After this change, we should be able to run multiple worker processes to leverage full power of the hardware. It would also allow us to run a process dedicated to a specific queue (something we don't support at the moment)

The goal of this card would be to introduce a way how to run the workers outside of the main executor process to address the limitations above. We also want to do it by leveraging proven techniques within the Ruby community rather than coming up with some crafted solution. It looks like Sidekiq/redis has over the years become the de-factor standard for async data processing and so far we have not hit any issues when trying to leverage it for our purposes.

From the vocabulary perspective, the plan is to split the current executor term to:

- orchestrator - the process that decides what runs when
- workers - the dummy processes able to run the actions, but not necessary do anything else, therefore they should stay mostly stateless


Related issues 7 (0 open7 closed)

Related to Foreman - Refactor #27415: installer/packaging support for configuring sidekiq in a required wayResolvedOndřej EzrActions
Blocked by foreman-tasks - Feature #27409: Initial support for running dynflow work on separate workers using SidekiqClosedIvan NecasActions
Blocked by foreman-tasks - Refactor #27410: Avoid using internal world.clock within workersClosedActions
Blocked by foreman-tasks - Refactor #27411: Abnormal states recovery with workers in separate processClosedAdam RuzickaActions
Blocked by foreman-tasks - Refactor #27412: Ensure not multiple orchestrators can run at onceClosedAdam RuzickaActions
Blocked by foreman-tasks - Feature #27413: Expose Sidekiq console via foreman-tasks + authenticationClosedActions
Blocked by foreman-tasks - Refactor #27633: Abnormal states recovery with workers in separate process - orchestratorClosedAdam RuzickaActions
Actions

Also available in: Atom PDF