Project

General

Profile

Tracker #27408

Updated by Ivan Necas over 5 years ago

Currently, the orchestration (deciding what steps happen when) happens in the very same process as the actual work (the bodies of the dynflow actions). 

 This approach has several limitations: 

 1. not possibility to scale the workers in an effective way: it's possible to run multiple executor processes, but the tasks get assigned to those executors early after planning, and there is no way for sharing the work related to one task across multiple tasks. This way one executor can be busy, while the other would have nothing to do. It also means that running new executor while the current ones are busy does not help with dealing with the current queue of items to work on 
 2. the issues in the actions code (such as inefficient dealing with memory) can lead to need to restarting the whole process. When the workers are part of the whole executor, restating the whole process means additional risk of not being able to resume some work due to context lost during the restarts. Also, during the restart, no tasks can be processed 
 3. given both orchestration and actual work happen in the same process, there is limitation of using more CPU cores for the work due to Ruby GIL limitations. After this change, we should be able to run multiple worker processes to leverage full power of the hardware. It would also allow us to run a process dedicated to a specific queue (something we don't support at the moment) 

 The goal of this card would be to introduce a way how to run the workers outside of the main executor process to address the limitations above. We also want to do it by leveraging proven techniques within the Ruby community rather than coming up with some crafted solution. It looks like Sidekiq/redis has over the years become the de-factor standard for async data processing and so far we have not hit any issues when trying to leverage it for our purposes. 

 From the vocabulary perspective, the plan is to split the current @executor@ term to: 

 - **orchestrator** - the process that decides what runs when 
 - **workers** - the dummy processes able to run the actions, but not necessary do anything else, therefore they should stay mostly stateless

Back