Tracker #31142
Updated by Lukas Zapletal about 4 years ago
* New model class is created: `ReportTranscript` (can be later renamed to just Report):
> * host_id
> * reported_at
> * status (StatusCalculator from the old Reports is reused and extended to use 64bits)
> * body (as a text PostgreSQL type which is compressed but not indexed on purpose)
> * origin (Puppet-9, Ansible, OpenSCAP, Unknown: Puppet-9 stands for report format V9 which is compatible with V10.)
* New model class `ReportKeyword(id: int, report_id: int, name: varchar)` associated with `ReportTranscript` via a join table and with B-TREE index on report_keyword.name for quick lookup
* Example keywords (this is a free-form value and plugin authors will decide what to use):
> * `PuppetHasFailedResource`
> * `PuppetHasFailedRestartResource`
> * `PuppetHasChangedResource`
> * `AnsibleHasUnreachableHost`
> * `AnsibleHasFailedTask`
> * `AnsibleHasChangedTask`
> * `ScapHasFailedRule`
> * `ScapHasOtheredRule`
> * `ScapHasHighSeverityFailure`
> * `ScapHasMediumSeverityFailure`
> * `ScapHasLowSeverityFailure`
> * `ScapFailure:xccdf_org.ssgproject.content_rule_ensure_redhat_gpgkey_installed`
> * `ScapFailure:xccdf_org.ssgproject.content_rule_security_patches_up_to_date`
* Even with all plugins enabled (Ansible, Puppet, OpenSCAP) it is expected to have up to 2000 keywords in the worst case
* Keywords can be added with detail level (a number constant, one of: IMPORTANT, REGULAR, DETAILED) and this will be unused in the first stage but having this defined by plugin authors enables us in the future on large-scale desployments to filter out some (e.g. DETAILED) keywords to shring the join table down to reasonable level.
* Plugin authors have complete control on how to store data in the `body` field. It can be JSON, YAML or plain text. There will be two APIs available for plugins to extend: import and view
* New import processing pipeline API will discourage plugins from accessing the model directly:
> * New report comes in
> * Foreman detect the origin
> * Foreman creates an instance of a plugin input transformation class
> * Report body (as Ruby hash) is passed into the class
> * Plugin performs transformation: hash-in - hash-out + status (big int) + keywords (hash or set)
* The same transformation is done during data migration (upgrade process from legacy reports to the new report)
* For report displaying, similar pipeline is available:
> * Report is loaded for display
> * Foreman creates an instance of a plugin view transformation class
> * Report body (as Ruby hash) is passed into the class
> * Plugin performs transformations (hash-in - hash-out - JSON for API output or UI)
> * Data is passed into views (ERB, RABL) for final display
* Plugin authors should not abuse keywords to report things that are likely be set for most reports, for example OpenSCAP should not be creating `ScapPassed:xyz` keywords because there will be too many of them.
* Searching is supported via:
> * Indexed keywords (e.g. `origin = scap and keyword = ScapHasHighSeverityFailure` or simply just the keyword which will be the default scoped_search field)
> * Full text in body (hidden by default in the UI, not advertised in docs)
Discussion on how we did get there:
https://community.theforeman.org/t/rfc-optimized-reports-storage/15573