Bug #31702
openRace condition in accepting salt key
Description
There seems to be a race condition in the way the foreman_salt plugin of foreman accepts the salt-key.
Sometimes, if multiple hosts are being provisioned at the same time, the salt key of some of the hosts don't get "accepted" and stay in the "unaccepted" list, without any errors being reports.
Analysis shows that during OS installation, two things happen shortly after each other at the end installation phase:
- The 'salt-call --grains' command sends the key to the salt-master, which becomes an "unaccepted" salt key
- The state is changed to "provisioned" at which the foreman_salt plugin tries to find the "unaccepted" key and accepts it if it exist.
The code where foreman performs step 2 resides here: [[https://github.com/theforeman/foreman_salt/blob/13.2.0/app/services/foreman_salt/smart_proxies/salt_keys.rb#L13]] (starting at line 13)
The code checks if foreman has any keys in its own internal cache:
- If true: the keys from the cache are used.
- If false: the keys are retrieved from salt and put in the cache. The cache is then given a lifetime of 1 minute.
The lifetime of 1 minute is the culprit here. If a second hosts goes through this process at the same time, the cache is still valid. Then foreman just looks for the key in the cache, instead of retrieve possible new keys from salt. In that case the salt-key is not found and will remain "unaccepted".
I've created a reproducible scenario and and commented out the code the reads from the cache in the foreman_salt code (commented out line 13 and 20). In that case the race condition is gone.
I'm willing to provide a pull request if needed, but I'm not sure what to do with the cache of the salt keys in this case. It seems to do more harm than good.