Project

General

Profile

Actions

Feature #22915

closed

Have a Mechanism to Proactively Detect and Clean "orphaned" Dynflow Tasks

Added by Adam Ruzicka over 6 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Fixed in Releases:
Found in Releases:

Description

Cloned from https://bugzilla.redhat.com/show_bug.cgi?id=1557067

Description of problem:
-----------------------
If some running tasks have been destroyed in the past via foreman-rake console, it is very possible that some "orphaned" dynflow tasks are still inside the database.

Also, as in older versions of Satellite 6, there was no "foreman_tasks:cleanup" rake, so most likely some tasks were destroyed with a similar command at some point during the existence of Satellite:

e.g.:
  1. foreman-rake console

ForemanTasks::Task.where(:state => :running).destroy_all

This situation is known to cause important issues over time, especially with "Register Host" Tasks which can take more and more time until a timeout is reached by the clients. These dynflow entries will stay in the database from time to time and survives Satellite upgrades.

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
6.3

How reproducible:
-----------------
100%

Steps to Reproduce:
-------------------
1. Triggers a lot of task. One easy way is launch a Sync Plan which will trigger multiple Repositories Syncing task.

2. While the task are running, destroy them with the following:
  1. foreman-rake console

ForemanTasks::Task.where(:state => :running).where(:label => "Actions::Katello::Repository::Sync").destroy_all

Actual results:
---------------
=> There is currently no proactive mechanism that detects this situation and it makes the troubleshooting more fastidious as a lot of aspect will be investigated before we come across this problem.

=> Also, there is currently no built-in mechanism to clean these entries.

Additional info:
----------------
=> We can manually detect these "orphaned" task with the following command:
  1. su - postgres -c 'psql -d foreman -c\
    "SELECT foreman_tasks_tasks.label,
    count(foreman_tasks_tasks.id) tasks_total,
    count(dynflow_actions.id) actions_total
    FROM dynflow_actions
    LEFT JOIN foreman_tasks_tasks
    ON (foreman_tasks_tasks.external_id = dynflow_actions.execution_plan_uuid)
    GROUP BY foreman_tasks_tasks.label ORDER BY actions_total DESC LIMIT 30"'

=> Then, look the the entry where the label is empty:
e.g.:
label | tasks_total | actions_total
------------------------------------------------------+-------------+--------------- | 0 | 5839

=> To fix the current situation, the known workaround is the following
  1. cat <<EOF | foreman-rake console
    persistence = ForemanTasks.dynflow.world.persistence
    adapter = persistence.adapter

batch_size = 5
total = adapter.db.fetch("select count(dynflow_execution_plans.uuid) from dynflow_execution_plans left join foreman_tasks_tasks on (dynflow_execution_plans.uuid = foreman_tasks_tasks.external_id) where foreman_tasks_tasks.id IS NULL").first[:count]
deleted = 0
puts "about to delete #{total} execution plans"
while (plans_without_tasks = adapter.db.fetch("select dynflow_execution_plans.uuid from dynflow_execution_plans left join foreman_tasks_tasks on (dynflow_execution_plans.uuid = foreman_tasks_tasks.external_id) where foreman_tasks_tasks.id IS NULL LIMIT #{batch_size}").all.map { |x| x[:uuid] }) && !plans_without_tasks.empty?
persistence.delete_execution_plans({ 'uuid' => plans_without_tasks }, batch_size)
deleted += plans_without_tasks.count
puts "deleted #{deleted} out of #{total}"
end
EOF


Related issues 1 (0 open1 closed)

Related to foreman-tasks - Feature #24114: Enable configuration setting to turn on foreman tasks cleanup loggingClosedIvan NecasActions
Actions

Also available in: Atom PDF