Bug #15253
distinct used when finding applicable errata, causing severe slowdowns on dashboard
Status:
Closed
Priority:
Normal
Assignee:
Category:
Errata Management
Target version:
Difficulty:
Triaged:
Yes
Bugzilla link:
Pull request:
Team Backlog:
Fixed in Releases:
Found in Releases:
Description
If you have >250 content hosts and >400 errata, the dashboard may take multiple minutes to load. It is due to this query:
DISTINCT "katello_errata".* FROM "katello_errata" INNER JOIN "katello_content_facet_errata" ON "katello_content_facet_errata"."erratum_id" = "katello_errata"."id" INNER JOIN "katello_content_facet_errata" "content_facet_errata_katello_errata_join" ON "content_facet_errata_katello_errata_join"."erratum_id" = "katello_errata"."id" INNER JOIN "katello_content_facets" ON "katello_content_facets"."id" = "content_facet_errata_katello_errata_join"."content_facet_id" WHERE "katello_content_facets"."host_id" IN ( SELECT "hosts"."id" FROM "hosts" WHERE "hosts"."type" IN ('Host::Managed') AND "hosts"."organization_id" IN (1)) ORDER BY updated DESC limit 6;
note the "distinct" at the beginning, it causes a very large sort to happen!
-> Sort (cost=50522.02..50628.22 rows=42483 width=1269) (actual time=63211.686..79656.391 rows=42483 loops=1) Sort Key: katello_errata.updated, katello_errata.id, katello_errata.uuid, katello_errata.errata_id, katello_errata.created_at, katello_errata.updated_at, katello_errata.issued, katello_errata.errata_type, katello_errata.se verity, katello_errata.title, katello_errata.solution, katello_errata.description, katello_errata.summary, katello_errata.reboot_suggested Sort Method: external merge Disk: 36304kB
Associated revisions
History
#1
Updated by The Foreman Bot over 2 years ago
- Status changed from New to Ready For Testing
- Pull request https://github.com/Katello/katello/pull/6094 added
#2
Updated by Chris Duryee over 2 years ago
- Status changed from Ready For Testing to Closed
- % Done changed from 0 to 100
Applied in changeset katello|4667141bff31d8781600a19977f1d529e0218b5b.
#3
Updated by Eric Helms over 2 years ago
- Legacy Backlogs Release (now unused) set to 144
#4
Updated by Eric Helms over 2 years ago
- Legacy Backlogs Release (now unused) changed from 144 to 143
Fixes #15253 - do not do `DISTINCT` on katello_errata.*
The erratum model obj has a query to find all errata that are applicable to
hosts. At the end of the query, it does a `DISTINCT` to make sure there aren't
duplicate errata that may have been listed from the join. For example:
```sql
SELECT DISTINCT "katello_errata".*
FROM "katello_errata"
INNER JOIN "katello_content_facet_errata" ON "katello_content_facet_errata"."erratum_id" = "katello_errata"."id"
INNER JOIN "katello_content_facet_errata" "content_facet_errata_katello_errata_join" ON "content_facet_errata_katello_errata_join"."erratum_id" = "katello_errata"."id"
INNER JOIN "katello_content_facets" ON "katello_content_facets"."id" = "content_facet_errata_katello_errata_join"."content_facet_id"
WHERE "katello_content_facets"."host_id" IN
(
SELECT "hosts"."id"
FROM "hosts"
WHERE "hosts"."type" IN ('Host::Managed')
AND "hosts"."organization_id" IN (1))
ORDER BY updated DESC
limit 6;
```
The `DISTINCT "katello_errata".*` causes issues when the number of hosts * the
number of errata is over about 100K, since it attempts to sort by every col, as
seen in this explain plan:
```
-> Sort (cost=50522.02..50628.22 rows=42483 width=1269) (actual time=63211.686..79656.391 rows=42483 loops=1)
Sort Key: katello_errata.updated, katello_errata.id, katello_errata.uuid, katello_errata.errata_id,
katello_errata.created_at, katello_errata.updated_at, katello_errata.issued,
katello_errata.errata_type, katello_errata.severity, katello_errata.title,
katello_errata.solution, katello_errata.description, katello_errata.summary,
katello_errata.reboot_suggested
Sort Method: external merge Disk: 36304kB
```
This can take upwards of 80 seconds on a strong system, and can take multiple
minutes on a development VM.
We do not need to check for uniqueness on every column, since katello_errata
already has a surrogate PK. However, we can't do `DISTINCT` on the ID, and then
an `ORDER BY` on the updated time, since that's two different sorts. Instead,
we can sort by the updated time, and then sort by ID. That way, the cols on
`DISTINCT` and `ORDER BY` match.