Bug #37285
closedREX jobs failing with Proxy task gone missing if proxy->foreman callback fails
Description
Description of problem:
REX jobs failing with Proxy task gone missing if proxy->foreman callback fails. This bug manifests itself the same way as BZ22702951, the other bug is caused by changes done in smart_proxy_dynflow-0.9.1, this one is caused by changes that went out in foreman-tasks-5.3.0.
If you see the proxy task gone missing errors and find `[E] <RuntimeError> Failed performing callback to Foreman server` in proxy logs, then it is this one. Alternatively, if the job runs for more than 10 minutes, it is this one.
How reproducible:
The conditions to trigger the bug are sort of difficult to reproduce
Steps to Reproduce:
1. Run a job
2. Ensure that the callback from proxy to foreman fails
This is rather difficult to do naturally, in development I resorted to modifying code. It was discovered "in the wild" during scale tests
3. Wait until foreman checks on the proxy
Actual results:
The rex job fails with 'Proxy task gone missing'
Expected results:
The job succeeds or fails, depending on whether it succeeded or failed on the capsule
Additional info: