Project

General

Profile

Actions

Refactor #30820

closed

Drop SHA1 digest and use hash index instead for reports

Added by Lukas Zapletal over 3 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Normal
Category:
Reporting
Target version:
-
Difficulty:
easy
Triaged:
Yes
Fixed in Releases:
Found in Releases:

Description

Foreman reports storage is designed so that reports are broken into individual lines and store them as 1:N references. The biggest offender is the messages and sources tables which contain those lines together with SHA1 hash - both are exactly the same, because puppet returls lines in format: "[source] log_content". So what we essentially do is break reports into lines, break lines into two pieces, calculate SHA1 sum from both parts and store that into the database.

The reason for all of that is that users can search for a source or for a (whole) report line using scoped search. This is only useful for Puppet users (source), I am unable to find a use case for searching for whole line. Full text search would make sense, or doing SQL LIKE query as well, but non of that Foreman supports.

The way this is implemented is to overcome SQL server limit of index because "messages"."value" and "sources"."value" fields are TEXT type. We caculate SHA1 hash and store the hash in a separate field which is then indexed. This is extremely slow, it's wasting of resources, there is no resolution of SHA1 conflicts.

This patch gets rid of our own SHA1 "digest" column and let SQL server do the index directly on the text column. Postgres has a hash index for that, it is faster than b-tree and efficient for comparison only, which is exactly what we need (we do not sort logs or sources tables - it makes no sense).

The patch

  • drop digest completely from the code base and let SQL server to do hashing via index
  • drop digest field and index from both tables (2 indices, 2 fields)
  • create index on value column (migration can be slow - testing needed)

The only issue could be sqlite3 which we still require for packaging, from my rough testing I am able to create index on text column just fine. Another struggle could be Ruby on Rails support for psql hash index, SQL command might be needed in a migration to create it.

For the record, hash index is supported in postgres from version 8.2 and from version 10 it is transaction safe. Meaning that a crash on pre-10 version would require index rebuild which is a trivial operation. CentOS installations already use Postgres v12 and Debian (Buster) has v11.

Also, complete refactoring of reports is planned where searching in reports will be only possible via tablescan anyway:

https://community.theforeman.org/t/rfc-optimized-reports-storage/15573


Related issues 3 (0 open3 closed)

Related to Foreman - Bug #30838: Do not return report contents (logs) after createClosedLukas ZapletalActions
Related to Foreman - Bug #31166: Optimize report import - attempt 2020RejectedLukas ZapletalActions
Related to OpenSCAP - Bug #31230: Report handling is failing due to removal of digest fieldClosedOndřej PražákActions
Actions #1

Updated by Lukas Zapletal over 3 years ago

  • Tracker changed from Bug to Refactor
  • Subject changed from Drop searching for report line and source to Drop SHA1 digest and use hash index instead for reports
  • Description updated (diff)
Actions #2

Updated by The Foreman Bot over 3 years ago

  • Status changed from New to Ready For Testing
  • Pull request https://github.com/theforeman/foreman/pull/7981 added
Actions #3

Updated by Lukas Zapletal over 3 years ago

  • Related to Bug #30838: Do not return report contents (logs) after create added
Actions #4

Updated by Lukas Zapletal over 3 years ago

  • Related to Bug #31166: Optimize report import - attempt 2020 added
Actions #5

Updated by The Foreman Bot over 3 years ago

  • Fixed in Releases 2.3.0 added
Actions #6

Updated by Lukas Zapletal over 3 years ago

  • Status changed from Ready For Testing to Closed
Actions #7

Updated by Tomer Brisker over 3 years ago

  • Related to Bug #31230: Report handling is failing due to removal of digest field added
Actions

Also available in: Atom PDF