Project

General

Profile

Actions

Bug #23311

open

new tasks stuck in planned/pending after upgrade to 1.16.1

Added by Justin Sherrill about 7 years ago. Updated about 7 years ago.

Status:
Need more information
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Difficulty:
Triaged:
Fixed in Releases:
Found in Releases:

Description

A user (rashkinadze and gkoch) in #theforeman saw this issue. Immediately after upgrading to 1.16.1 Katello Host::Update tasks started piling up in planned/pending state and were never being processed. Restarting all the services seemed to do nothing to resolve the issue.

Looking through their logs I really did not see much, but noticed in their postgresql logs they seemed to see this anytime foreman-tasks was restarted:

2018-04-17 10:36:41 EDT ERROR: duplicate key value violates unique constraint "dynflow_coordinator_records_pkey"
2018-04-17 10:36:41 EDT DETAIL: Key (id, class)=(delayed-executor, Dynflow::Coordinator::DelayedExecutorLock) already exists.
2018-04-17 10:36:41 EDT STATEMENT: INSERT INTO "dynflow_coordinator_records" ("data", "id", "owner_id", "class") VALUES ('{"class":"Dynflow::Coordinator::DelayedExecutorLock","owner_id":"world:89c713a9-ffd3-49b2-ad84-c921fad5e75e","world_id":"89c713a9-ffd3-49b2-ad84-c921fad5e75e","id":"delayed-executor"}', 'delayed-executor', 'world:89c713a9-ffd3-49b2-ad84-c921fad5e75e', 'Dynflow::Coordinator::DelayedExecutorLock') RETURNING "id"

I had them go to dynflow status and invalidate their worlds. 4 worlds were invalidated and after a foreman-tasks restart, things seemed to be processed.

Any idea why this happened or what we can do to prevent manual intervention?

Actions #1

Updated by Justin Sherrill about 7 years ago

  • Description updated (diff)
Actions #2

Updated by Ivan Necas about 7 years ago

  • Status changed from New to Need more information

Wasn't this the case that we resolved via IRC by invalidating the executor world?

Actions #3

Updated by Justin Sherrill about 7 years ago

It was, i filed this to see if there was some way to automatically remedy this upon restart of foreman-tasks

Actions #4

Updated by Ivan Necas about 7 years ago

We do try to do auto-remediation of the invalid worlds on tasks restart. We don't have enough information right now to proceed with debugging. Foreman-debug might help, but I guess we don't have it from the last failure, therefore we need to wait until this happens again.

Actions #5

Updated by Justin Sherrill about 7 years ago

I have a copy of the foreman-debug, i just can't upload it to the issue for privacy concerns.

Actions

Also available in: Atom PDF