Project

General

Profile

Refactor #30820

Drop SHA1 digest and use hash index instead for reports

Added by Lukas Zapletal 5 months ago. Updated 3 months ago.

Status:
Closed
Priority:
Normal
Category:
Reporting
Target version:
-
Difficulty:
easy
Triaged:
Yes
Bugzilla link:
Fixed in Releases:
Found in Releases:

Description

Foreman reports storage is designed so that reports are broken into individual lines and store them as 1:N references. The biggest offender is the messages and sources tables which contain those lines together with SHA1 hash - both are exactly the same, because puppet returls lines in format: "[source] log_content". So what we essentially do is break reports into lines, break lines into two pieces, calculate SHA1 sum from both parts and store that into the database.

The reason for all of that is that users can search for a source or for a (whole) report line using scoped search. This is only useful for Puppet users (source), I am unable to find a use case for searching for whole line. Full text search would make sense, or doing SQL LIKE query as well, but non of that Foreman supports.

The way this is implemented is to overcome SQL server limit of index because "messages"."value" and "sources"."value" fields are TEXT type. We caculate SHA1 hash and store the hash in a separate field which is then indexed. This is extremely slow, it's wasting of resources, there is no resolution of SHA1 conflicts.

This patch gets rid of our own SHA1 "digest" column and let SQL server do the index directly on the text column. Postgres has a hash index for that, it is faster than b-tree and efficient for comparison only, which is exactly what we need (we do not sort logs or sources tables - it makes no sense).

The patch

  • drop digest completely from the code base and let SQL server to do hashing via index
  • drop digest field and index from both tables (2 indices, 2 fields)
  • create index on value column (migration can be slow - testing needed)

The only issue could be sqlite3 which we still require for packaging, from my rough testing I am able to create index on text column just fine. Another struggle could be Ruby on Rails support for psql hash index, SQL command might be needed in a migration to create it.

For the record, hash index is supported in postgres from version 8.2 and from version 10 it is transaction safe. Meaning that a crash on pre-10 version would require index rebuild which is a trivial operation. CentOS installations already use Postgres v12 and Debian (Buster) has v11.

Also, complete refactoring of reports is planned where searching in reports will be only possible via tablescan anyway:

https://community.theforeman.org/t/rfc-optimized-reports-storage/15573


Related issues

Related to Foreman - Bug #30838: Do not return report contents (logs) after createClosed
Related to Foreman - Bug #31166: Optimize report import - attempt 2020Rejected
Related to OpenSCAP - Bug #31230: Report handling is failing due to removal of digest fieldClosed

Associated revisions

Revision 2f42ffc3 (diff)
Added by Lukas Zapletal 3 months ago

Fixes #30820 - drop digest and use hash index instead

History

#1 Updated by Lukas Zapletal 5 months ago

  • Description updated (diff)
  • Subject changed from Drop searching for report line and source to Drop SHA1 digest and use hash index instead for reports
  • Tracker changed from Bug to Refactor

#2 Updated by The Foreman Bot 5 months ago

  • Status changed from New to Ready For Testing
  • Pull request https://github.com/theforeman/foreman/pull/7981 added

#3 Updated by Lukas Zapletal 4 months ago

  • Related to Bug #30838: Do not return report contents (logs) after create added

#4 Updated by Lukas Zapletal 3 months ago

  • Related to Bug #31166: Optimize report import - attempt 2020 added

#5 Updated by The Foreman Bot 3 months ago

  • Fixed in Releases 2.3.0 added

#6 Updated by Lukas Zapletal 3 months ago

  • Status changed from Ready For Testing to Closed

#7 Updated by Tomer Brisker 3 months ago

  • Related to Bug #31230: Report handling is failing due to removal of digest field added

Also available in: Atom PDF