Project

General

Profile

Actions

Bug #22935

closed

First bulk REX after smart_proxy_dynflow_core can raise "Could not use any Capsule" error

Added by Adam Ruzicka about 6 years ago. Updated about 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Difficulty:
Triaged:
Fixed in Releases:
Found in Releases:

Description

Cloned from https://bugzilla.redhat.com/show_bug.cgi?id=1558069

Description of problem:
When having one Capsule for REX (or when only one Caps can be used for REX to some hosts), "Could not use any Capsule" error might happen after a fresh restart of smart_proxy_dynflow_core (spdc) service, any time delay, and then a bulk of REX jobs.

Per aruzicka++, it happens because:
- restarting or reloading this service itself does not load DB scheme nor apply DB migrations
- that is done during the very first REX job / probe of this Capsule "liveness"
- if multiple REX jobs are scheduled to the same time and multiple such queries "is this Capsule running?" are raised concurrently, a race condition followed by some SQL error in spdc logs can cause the probe fails
- if this Caps is the only available for some host, whole REX job for the host fails with "Could not use any Capsule" error

Version-Release number of selected component (if applicable):
any (incl. Sat 6.2.14 and 6.3.0)

How reproducible:
100% (in few attempts, the worst)

Steps to Reproduce:
0) Have Sat without another Caps and REX working to some host (1 is enough)

1) restart spdc service:
service smart_proxy_dynflow_core restart

2) invoke more REX jobs in near future:
cat "date" > /tmp/rex-date

  1. update --start-at to some soon-in-future time, and update "MYHOST" to some host you have, optionally update --job-template-id to SSH REX one

for i in $(seq 1 100); do echo "job-invocation create --job-template-id 94 --input-files command=/tmp/rex-date --search-query \"name ~ MYHOST\" --async --start-at \"2018-03-19T13:50:00\""; done | hammer -u admin -p redhat shell

3) observe if all jobs succeeded

Actual results:
3) few very first jobs (usually two) fail - not granted, depends on how fast they trigger the probe to spdc

Expected results:
3) no such errors

Additional info:
workaround: run a dummy REX job against that Capsule after any spdc service reload and/or restart

Actions #1

Updated by The Foreman Bot about 6 years ago

  • Status changed from New to Ready For Testing
  • Assignee set to Adam Ruzicka
  • Pull request https://github.com/theforeman/smart_proxy_dynflow/pull/48 added
Actions #2

Updated by Adam Ruzicka about 6 years ago

  • Status changed from Ready For Testing to Closed
  • % Done changed from 0 to 100
Actions

Also available in: Atom PDF