Tracker #31142
Updated by Lukas Zapletal over 3 years ago
* New model class is created: `ReportTranscript` (can be later renamed to just Report): > * host_id > * reported_at > * status (StatusCalculator from the old Reports are slow, refactor how we store them from scratch. is reused and extended to use 64bits) > * body (as a text PostgreSQL type which is compressed but not indexed on purpose) > * origin (Puppet-9, Ansible, OpenSCAP, Unknown: Puppet-9 stands for report format V9 which is compatible with V10.) * New model class `ReportKeyword(id: int, report_id: int, name: varchar)` associated with `ReportTranscript` via a join table and with B-TREE index on report_keyword.name for quick lookup * Example keywords (this is a free-form value and plugin authors will decide what to use): > * `PuppetHasFailedResource` > * `PuppetHasFailedRestartResource` > * `PuppetHasChangedResource` > * `AnsibleHasUnreachableHost` > * `AnsibleHasFailedTask` > * `AnsibleHasChangedTask` > * `ScapHasFailedRule` > * `ScapHasOtheredRule` > * `ScapHasHighSeverityFailure` > * `ScapHasMediumSeverityFailure` > * `ScapHasLowSeverityFailure` > * `ScapFailure:xccdf_org.ssgproject.content_rule_ensure_redhat_gpgkey_installed` > * `ScapFailure:xccdf_org.ssgproject.content_rule_security_patches_up_to_date` * Even with all plugins enabled (Ansible, Puppet, OpenSCAP) it is expected to have up to 2000 keywords in the worst case * Keywords can be added with detail level (a number constant, one of: IMPORTANT, REGULAR, DETAILED) and proxy this will be unused in the first stage but having this defined by plugin authors enables us in the future on large-scale desployments to filter out some (e.g. DETAILED) keywords to shring the join table down to reasonable level. * Plugin authors have complete control on how to store data in the `body` field. It can be JSON, YAML or plain text. There will be created, we can later on consider merging them two APIs available for plugins to extend: import and view * New import processing pipeline API will discourage plugins from accessing the model directly: > * New report comes in > * Foreman detect the origin > * Foreman creates an instance of a plugin input transformation class > * Report body (as Ruby hash) is passed into core. the class > * Plugin performs transformation: hash-in - hash-out + status (big int) + keywords (hash or set) * The full plan same transformation is done during data migration (upgrade process from legacy reports to the new report) * For report displaying, similar pipeline is available: > * Report is loaded for display > * Foreman creates an instance of a plugin view transformation class > * Report body (as Ruby hash) is passed into the class > * Plugin performs transformations (hash-in - hash-out - JSON for API output or UI) > * Data is passed into views (ERB, RABL) for final display * Plugin authors should not abuse keywords to report things that are likely be set for most reports, for example OpenSCAP should not be creating `ScapPassed:xyz` keywords because there will be too many of them. * Searching is supported via: > * Indexed keywords (e.g. `origin = scap and keyword = ScapHasHighSeverityFailure` or simply just the keyword which will be the default scoped_search field) > * Full text in body (hidden by default in the README: https://github.com/theforeman/foreman_host_reports UI, not advertised in docs) Discussion on how we did get there: https://community.theforeman.org/t/rfc-optimized-reports-storage/15573