Bug #34957
Manifest refresh randomly fails with "No such file or directory" when having multile dynflow workers
Description
Cloned from https://bugzilla.redhat.com/show_bug.cgi?id=2090271
Description of problem:
Manifest refresh randomly fails on a Satellite with multiple dynflow workers with error:
Error: No such file or directory @ rb_sysopen - /tmp/0.7851943882678857.zip
The reason is tricky :
- ManifestRefresh task determines filename for the new manifest file as /tmp/#{rand}.zip
- UpstreamExport dynflow step is asked to export the new manifest to that file
- subsequent Import dynflow step is asked to read the file and process the update further
The dynflow steps can be processed by different dynflow workers, which are run as different systemd services. And sadly for us, the services use their own private temp directory like:
/tmp/systemd-private-4f8b157ce7c040f4b27e7ecbba68aa22-dynflow-sidekiq@worker-3.service-Uannn5/tmp/
So, when UpstreamExport step is executed by one dynflow worker, it puts the zip file to its own private temp. And if we are unlucky, the Import step is picked by another worker that misses the file in its own private temp /o\ .
Which means, having 3 dynflow workers, there is just 1/3 probability the manifest refresh succeeds.
We need to use static/shared tmp file instead.
Version-Release number of selected component (if applicable):
Sat 6.10.5
How reproducible:
2/3 when having 3 dynflow workers
Steps to Reproduce:
1. Set up Satellite with 3 dynflow workers, e.g. per https://access.redhat.com/solutions/5695311
2. Import a manifest
3. Repeatedly refresh it:
hammer subscription refresh-manifest --organization-id=1
Actual results:
3. randomly fails with error:
Error: No such file or directory @ rb_sysopen - /tmp/0.7851943882678857.zip
in such a case, the zip file can be spot under a private temp dir of a worker's service, like:
/tmp/systemd-private-4f8b157ce7c040f4b27e7ecbba68aa22-dynflow-sidekiq@worker-3.service-Uannn5/tmp/0.7851943882678857.zip
Expected results:
manifest refresh to always succeed
Additional info:
Associated revisions
History
#1
Updated by The Foreman Bot 10 months ago
- Status changed from New to Ready For Testing
- Pull request https://github.com/Katello/katello/pull/10129 added
#2
Updated by Ian Ballou 10 months ago
- Triaged changed from No to Yes
- Target version set to Katello 4.5.0
- Category set to Subscriptions
#3
Updated by The Foreman Bot 10 months ago
- Fixed in Releases Katello 4.5.0 added
#4
Updated by Adam Ruzicka 10 months ago
- Status changed from Ready For Testing to Closed
Applied in changeset katello|dabeda407b7005774452abc8cfb65f457a913a60.
Fixes #34957 - Put manifest into a shared temp directory (#10129)
On production deployments, dynflow workers have private tmp directories,
meaning they cannot use /tmp as a place for shared data. This could lead
to manifest refresh failing on scaled-up deployments.