|Assigned To:||Shimon Shtein|
|Target version:||Foreman - Team Ivan Iteration 12|
|Found in release:||Pull request:||https://github.com/Dynflow/dynflow/pull/211, https://github.com/theforeman/foreman-tasks/pull/216|
|Velocity based estimate||-|
Given once Ruby allocates some memory, it doesn't give it back, bigger
set of larger actions can lead to quite big memory consumption that
persists and can accumulate over time. With this, it's hard to keep
memory consumption fully under control, especially in an environment
with other systems (passenger, pulp, candlepin, qpid). Since the
executors can terminate nicely without affecting the tasks itselves,
it should be pretty easy to extend it to watch the memory consumption.
1. config options:
max_memory_per_executor - the threshold for the memory size per executor
min_executors_count - minimal count executors (default 1)
minimal_executor_age - the period it will check whether the memory consumption didn't grow (default 1h)
2. the executor will periodically check it's memory usage,
(http://stackoverflow.com/a/24423978/457560 seems to be a sane
approach for us)
3. if memory usage exceeds `max_memory_per_executor`, the executor is
older than `minimal_executor_age` (to prevent situation, where the
memory would grow too fast over the max_memory_per_executor, which
would mean we wouldn't do anything than restarting the executors
without getting anything done and the amount of current executors
would not go under `min_executors_count`, politely terminate executor
4. the polite termination should be able to hand over all the tasks to
the other executors and once everything is finalized on the executor, it would just exit
5. the daemon monitor would notice the executor getting closed and running a new executor
It would be configurable, turned off by default (for development) but we would configure
this in production, where we can rely on the monitor being present.
#6 Updated by Shimon Shtein 8 months ago
A couple of thoughts:
First, since the process stabilizes at some point, we are not satisfied with the fact that they are "stuck" in memory for too long.
Maybe we can address it by more aggressive cleanup after the task finishes - maybe calling a full GC after each task, so it's leftovers will be purged.
Second, again, since the process stabilizes at some point, maybe we should enhance the algorithm that spawns new executors.
I mean monitoring the amount of memory consumed by all executors, and when a threshold is passed reduce the amount of live executors. Thus the memory would be divided between fewer executors, allowing them to get to the point where they can stabilize. The same can go the other way around, if the executors have stabilized before reaching the memory threshold, we can spawn extra executor and get the tasks queue cleared faster.