Project

General

Profile

Bug #15253

distinct used when finding applicable errata, causing severe slowdowns on dashboard

Added by Chris Duryee almost 3 years ago. Updated 9 months ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Errata Management
Target version:
Difficulty:
Triaged:
Yes
Bugzilla link:
Team Backlog:
Fixed in Releases:
Found in Releases:

Description

If you have >250 content hosts and >400 errata, the dashboard may take multiple minutes to load. It is due to this query:

 DISTINCT "katello_errata".* FROM "katello_errata" INNER JOIN "katello_content_facet_errata" ON "katello_content_facet_errata"."erratum_id" = "katello_errata"."id" INNER JOIN "katello_content_facet_errata" "content_facet_errata_katello_errata_join" ON "content_facet_errata_katello_errata_join"."erratum_id" = "katello_errata"."id" INNER JOIN "katello_content_facets" ON "katello_content_facets"."id" = "content_facet_errata_katello_errata_join"."content_facet_id" WHERE "katello_content_facets"."host_id" IN
(
       SELECT "hosts"."id" 
       FROM   "hosts" 
       WHERE  "hosts"."type"            IN ('Host::Managed')
       AND    "hosts"."organization_id" IN (1)) ORDER BY updated DESC limit 6;

note the "distinct" at the beginning, it causes a very large sort to happen!

         ->  Sort  (cost=50522.02..50628.22 rows=42483 width=1269) (actual time=63211.686..79656.391 rows=42483 loops=1)
               Sort Key: katello_errata.updated, katello_errata.id, katello_errata.uuid, katello_errata.errata_id, katello_errata.created_at, katello_errata.updated_at, katello_errata.issued, katello_errata.errata_type, katello_errata.se
verity, katello_errata.title, katello_errata.solution, katello_errata.description, katello_errata.summary, katello_errata.reboot_suggested
               Sort Method: external merge  Disk: 36304kB

Associated revisions

Revision 4667141b (diff)
Added by Chris Duryee almost 3 years ago

Fixes #15253 - do not do `DISTINCT` on katello_errata.*

The erratum model obj has a query to find all errata that are applicable to
hosts. At the end of the query, it does a `DISTINCT` to make sure there aren't
duplicate errata that may have been listed from the join. For example:

```sql
SELECT DISTINCT "katello_errata".*
FROM "katello_errata"
INNER JOIN "katello_content_facet_errata" ON "katello_content_facet_errata"."erratum_id" = "katello_errata"."id"
INNER JOIN "katello_content_facet_errata" "content_facet_errata_katello_errata_join" ON "content_facet_errata_katello_errata_join"."erratum_id" = "katello_errata"."id"
INNER JOIN "katello_content_facets" ON "katello_content_facets"."id" = "content_facet_errata_katello_errata_join"."content_facet_id"
WHERE "katello_content_facets"."host_id" IN
(
SELECT "hosts"."id"
FROM "hosts"
WHERE "hosts"."type" IN ('Host::Managed')
AND "hosts"."organization_id" IN (1))
ORDER BY updated DESC
limit 6;
```

The `DISTINCT "katello_errata".*` causes issues when the number of hosts * the
number of errata is over about 100K, since it attempts to sort by every col, as
seen in this explain plan:

```
-> Sort (cost=50522.02..50628.22 rows=42483 width=1269) (actual time=63211.686..79656.391 rows=42483 loops=1)
Sort Key: katello_errata.updated, katello_errata.id, katello_errata.uuid, katello_errata.errata_id,
katello_errata.created_at, katello_errata.updated_at, katello_errata.issued,
katello_errata.errata_type, katello_errata.severity, katello_errata.title,
katello_errata.solution, katello_errata.description, katello_errata.summary,
katello_errata.reboot_suggested
Sort Method: external merge Disk: 36304kB

```

This can take upwards of 80 seconds on a strong system, and can take multiple
minutes on a development VM.

We do not need to check for uniqueness on every column, since katello_errata
already has a surrogate PK. However, we can't do `DISTINCT` on the ID, and then
an `ORDER BY` on the updated time, since that's two different sorts. Instead,
we can sort by the updated time, and then sort by ID. That way, the cols on
`DISTINCT` and `ORDER BY` match.

Revision 790f44a3
Added by Chris Duryee almost 3 years ago

Merge pull request #6094 from beav/not-so-uniq

Fixes #15253 - do not do `DISTINCT` on katello_errata.*

History

#1 Updated by The Foreman Bot almost 3 years ago

  • Status changed from New to Ready For Testing
  • Pull request https://github.com/Katello/katello/pull/6094 added

#2 Updated by Chris Duryee almost 3 years ago

  • Status changed from Ready For Testing to Closed
  • % Done changed from 0 to 100

#3 Updated by Eric Helms almost 3 years ago

  • Legacy Backlogs Release (now unused) set to 144

#4 Updated by Eric Helms almost 3 years ago

  • Legacy Backlogs Release (now unused) changed from 144 to 143

Also available in: Atom PDF