Drop SHA1 digest and use hash index instead for reports
Foreman reports storage is designed so that reports are broken into individual lines and store them as 1:N references. The biggest offender is the messages and sources tables which contain those lines together with SHA1 hash - both are exactly the same, because puppet returls lines in format: "[source] log_content". So what we essentially do is break reports into lines, break lines into two pieces, calculate SHA1 sum from both parts and store that into the database.
The reason for all of that is that users can search for a source or for a (whole) report line using scoped search. This is only useful for Puppet users (source), I am unable to find a use case for searching for whole line. Full text search would make sense, or doing SQL LIKE query as well, but non of that Foreman supports.
The way this is implemented is to overcome SQL server limit of index because "messages"."value" and "sources"."value" fields are TEXT type. We caculate SHA1 hash and store the hash in a separate field which is then indexed. This is extremely slow, it's wasting of resources, there is no resolution of SHA1 conflicts.
This patch gets rid of our own SHA1 "digest" column and let SQL server do the index directly on the text column. Postgres has a hash index for that, it is faster than b-tree and efficient for comparison only, which is exactly what we need (we do not sort logs or sources tables - it makes no sense).
- drop digest completely from the code base and let SQL server to do hashing via index
- drop digest field and index from both tables (2 indices, 2 fields)
- create index on value column (migration can be slow - testing needed)
The only issue could be sqlite3 which we still require for packaging, from my rough testing I am able to create index on text column just fine. Another struggle could be Ruby on Rails support for psql hash index, SQL command might be needed in a migration to create it.
For the record, hash index is supported in postgres from version 8.2 and from version 10 it is transaction safe. Meaning that a crash on pre-10 version would require index rebuild which is a trivial operation. CentOS installations already use Postgres v12 and Debian (Buster) has v11.
Also, complete refactoring of reports is planned where searching in reports will be only possible via tablescan anyway: