Refactor #12310: Remove facter cache and introduce totaluploads option - Discovery - Foreman

Actions

Copy link

Refactor #12310

closed

Remove facter cache and introduce totaluploads option

Added by Alex Balk over 9 years ago. Updated almost 6 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Lukas Zapletal

Category:

Image

Difficulty:

trivial

Triaged:

Bugzilla link:

Pull request:

https://github.com/theforeman/foreman-discovery-image/pull/115

Fixed in Releases:

Discovery Image 3.5.1

Found in Releases:

Red Hat JIRA:

Description

The discovery-register script pushes facts without clearing the facter cache. This means that if one tries to force fact upload - for example, as a "ping" mechanism for discovery nodes, only cached facts are uploaded.
This can be overridden by removing '/tmp/proxy_cache' and sending the process a SIGHUP, but this involves a (relatively small) race condition.
The overhead of running 'Facter.clear' on every 'upload_sleep' interval doesn't seem big, especially since an external cache file (/tmp/proxy_cache) is used. This avoids the race condition.

Actions

Copy link

Updated by Alex Balk over 9 years ago

PR: https://github.com/theforeman/foreman-discovery-image/pull/39

Actions

Copy link

Updated by Lukas Zapletal over 9 years ago

Status changed from New to Rejected

I have closed the PR, I don't understand this. In the new version, you have the opportunity to force N first uploads via configuration option. If you set this to 99999 then it's like forever if you need this.

Beware this brings race conditions on Foreman side (discovered host can be created twice if you hit provisioning window).

Actions

Copy link

Updated by Alex Balk over 9 years ago

I'll try to better explain. Here's the current flow:
1. The current implementation assumes that you want to perform a fact upload ever fdi.uploadsleep for as many as fdi.cachefacts times.
2. The defaults of fdi.uploadsleep is 30 and of fdi.cachefacts is 5.
3. Facter.clear is called only inside the fdi.cachefacts loop. meaning facter cache is purged only when using this mechanism (unless a SIGHUP is sent, but we'll get to that).

The scenario we're trying to cover is one where a discovery node uploads fresh facts to Foreman every hour. This is used for several things:
1. To inform Foreman that the discovery node is alive
2. To upload custom facts that may have changed, or were added

If we try and use the existing mechanism to cover these scenarios, we would have to modify fdi.uploadsleep to 3600 seconds. Since the initial 5 runs address race conditions with weird hardware / networking drivers, where facts show up as empty because of long init times, setting the sleep interval to an hour would mean we're running with "stale empty facts" for a long period of time. Worse yet, we've seen a scenario where the discovery image fails to communicate with the Foreman proxy altogether till the facts refresh. We haven't dug deep into this, as it's mitigated by the initial sleep interval.

Instead, we're running our own external process, which removes the PROXY_CACHE files, thus forcing a fact upload. This works well, except the Facter facts are not necessarily fresh, since Facter.clear isn't run in the main upload loop, only inside the fdi.cachefacts loop. We can trigger a Facter.clear by sending the discovery-register process a SIGHUP, but we feel that's an extra step which isn't really needed. The simplest solution would be to move Facter.clear to the general upload loop instead of the internal fdi.cachefacts loop.

I looked again at the PR and saw that we haven't removed Facter.clear from the inner loop, only added it to the outer one. Not a big deal, but if you're willing to accept the suggested change, we'll send a PR with only one Facter.clear call in the outer loop.

Actions

Copy link

Updated by Lukas Zapletal over 9 years ago

Tracker changed from Bug to Refactor
Subject changed from Discovery image serves cached facts to Remove facter cache and introduce totaluploads option
Status changed from Rejected to New

Alex Balk wrote:

The scenario we're trying to cover is one where a discovery node uploads fresh facts to Foreman every hour. This is used for several things:
1. To inform Foreman that the discovery node is alive
2. To upload custom facts that may have changed, or were added

Ok thanks for the explanation.

First, re-uploading facts is a dangerous one, we've encountered many issues with that (multiple discovered hosts) and actually I am setting the default value of fdi.cachefacts to zero for the 3.0.4 release (the final build hopefully). You should consider pulling this information. If you create a simple rake task on Foreman server and call this from cron, you can sleep well:

Host::Discovered.all.each {|h| h.refresh_facts}

Instead, we're running our own external process, which removes the PROXY_CACHE files, thus forcing a fact upload. This works well, except the Facter facts are not necessarily fresh, since Facter.clear isn't run in the main upload loop, only inside the fdi.cachefacts loop. We can trigger a Facter.clear by sending the discovery-register process a SIGHUP, but we feel that's an extra step which isn't really needed. The simplest solution would be to move Facter.clear to the general upload loop instead of the internal fdi.cachefacts loop.

This is too complicated, but I am not saying we should not improve.

Why don't we fix this properly? Here is my idea: Once a host is successfuly registered, let's add one extra fact "discovery_registered_at" and timestamp. Let's send this now along for every single new request. Then we can get rid of cachefacts option completely and simply provide new option: totaluploads=N where N can be any number (1 - just once, 0 - forever, the default option).

Then, I can say we can get rid of the caching code on the node completely and only fiter some things out on the server (e.g. free memory should not be updated on each request for sure). But we can decide on the serverside rather on the node itself. This cleans the register script a lot and seems to be much cleaner.

What you think? Are you willing to work on this towards this goal? This needs a change in the /create method in the plugin as well (if host was already registered, never create it again and also filter out uninteresting facts like memory - this can be global setting as a regexp so users can modify/decide).

I am reopening/renaming

Actions

Copy link

Updated by Lukas Zapletal over 9 years ago

And if you still need faster upload rate for the initial N runs, we can implement that. I think the workflow would be:

Initial 5 runs every minute (configurable with initialuploads=5)
Always send "discovery_registered_at" fact to prevent race conditions.
Then refresh facts every hour (configurable with uploadsleep=60)
By default forever (configurable with totaluploads=0)
Remove the cache and simplify the register script.

And change on the discovery plugin side to:

Respect "discovery_registered_at".
Filter out "unstable" facts (free memory) - configurable via regexp discovery global setting.
Respect new discovery global setting "discovery_update_facts" (by default true) to update with fresh facts (I can imagine on some deployments this could be problematic and users should be able to turn this off).

We have a code already in that draws these nice icons, but it is not working properly atm actually thanks to the fact hosts are only registered once: https://github.com/theforeman/foreman_discovery/pull/201

Actions

Copy link