Bug #8565
closedreport::expire is running very slowly
Description
The fix included in #1592 has made report::expire very slow
Expire 5 day old reports (runs every 30 minutes)
Pre fix: 2.5 minutes
Post fix: 21 minutes
Expire 1 day old non eventful reports (runs every 30 minutes)
Pre fix: 2.5 minutes
Post fix: 21 minutes
Most of the time seems to be spent in Postgres deleting the records in the post fix report.rb. I will be working on getting more data added to this ticket.
Updated by Chuck Schweizer about 10 years ago
4,000 nodes running puppet once per hour.
Updated by Dominic Cleal about 10 years ago
- Related to Bug #1592: report::expire errors with millions of records added
Updated by Dominic Cleal about 10 years ago
- Category set to Reporting
- Translation missing: en.field_release set to 29
Updated by Dominic Cleal about 10 years ago
The most useful info would be debug logs while running the rake task: http://projects.theforeman.org/projects/foreman/wiki/Troubleshooting#How-do-I-enable-debugging
The SQL logs would then show the time taken for each query, and possibly even EXPLAIN data if they appear to take long enough. If not, then we could try re-running some queries with EXPLAIN.
Updated by Dominic Cleal about 10 years ago
- Related to Bug #8316: Removing a long-standing host creates huge transaction added
Updated by Dominic Cleal about 10 years ago
- Translation missing: en.field_release deleted (
29)
Updated by Chuck Schweizer about 10 years ago
Here is where the report is being slow.
foreman | foreman | 11783 | | f | 2014-12-10 13:52:56.117167+00 | DELETE FROM "messages" WHERE (id not IN (SELECT message_id FROM "logs" )) foreman=# explain DELETE FROM "messages" WHERE (id not IN (SELECT message_id FROM "logs" )); QUERY PLAN ---------------------------------------------------------------------------- Seq Scan on messages (cost=38070.64..323418103.91 rows=12289 width=6) Filter: (NOT (SubPlan 1)) SubPlan 1 -> Materialize (cost=38070.64..60375.40 rows=1603876 width=4) -> Seq Scan on logs (cost=0.00..30200.76 rows=1603876 width=4) (5 rows)
Updated by Chuck Schweizer about 10 years ago
Explain with DISTINCT
foreman=# explain DELETE FROM "messages" WHERE (id not IN (SELECT DISTINCT message_id from "logs" )); QUERY PLAN ---------------------------------------------------------------------------- Seq Scan on messages (cost=34343.70..36100.92 rows=12289 width=6) Filter: (NOT (hashed SubPlan 1)) SubPlan 1 -> HashAggregate (cost=34314.32..34337.82 rows=2350 width=4) -> Seq Scan on logs (cost=0.00..30292.46 rows=1608746 width=4) (5 rows)
Updated by Dominic Cleal about 10 years ago
- Status changed from New to Assigned
- Assignee set to Dominic Cleal
Updated by The Foreman Bot about 10 years ago
- Status changed from Assigned to Ready For Testing
- Pull request https://github.com/theforeman/foreman/pull/2008 added
- Pull request deleted (
)
Updated by Chuck Schweizer about 10 years ago
Much better with your fix. Down to 12 seconds.
foreman=# explain analyze DELETE FROM "messages" WHERE (id not IN (SELECT DISTINCT message_id from "logs" )); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------ Seq Scan on messages (cost=34423.41..36180.64 rows=12289 width=6) (actual time=699.123..710.044 rows=19 loops=1) Filter: (NOT (hashed SubPlan 1)) SubPlan 1 -> HashAggregate (cost=34394.04..34417.54 rows=2350 width=4) (actual time=670.766..680.408 rows=28760 loops=1) -> Seq Scan on logs (cost=0.00..30362.83 rows=1612483 width=4) (actual time=0.055..312.759 rows=1613655 loops=1) Total runtime: 710.908 ms (6 rows)
$ time ./foreman-rake reports:expire days=5 RAILS_ENV='production' real 0m12.144s user 0m8.497s sys 0m0.924s
Updated by Dominic Cleal almost 10 years ago
- Related to Refactor #8798: Reports.expire uses an unnecessary join on logs added
Updated by Martin Jackson almost 10 years ago
Applying the patch attached to the PR gets our report expiry for ~750,000 reports into the 30 second range.
Updated by Dominic Cleal almost 10 years ago
- Status changed from Ready For Testing to Closed
- % Done changed from 0 to 100
Applied in changeset 9fde85ad830031d38a6786d30a48519344dad446.
Updated by Dominic Cleal almost 10 years ago
- Translation missing: en.field_release set to 32
Updated by Dirk Heinrichs over 9 years ago
Running 1.7.4, expiring reports still is quite slow (PostgreSQL 9.3.x). Runs longer than a day, so that we run into situations where multiple expire jobs are running, hogging the CPU to 100%. Plan looks like this:
foreman=> select count(*) from reports; count ------- 1951 (1 row) foreman=> select count(*) from messages; count -------- 704462 (1 row) foreman=> select count(*) from logs; count -------- 802815 (1 row) foreman=> explain DELETE FROM "messages" WHERE (id not IN (SELECT DISTINCT message_id FROM "logs" )); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Delete on messages (cost=0.42..9806323789.43 rows=352231 width=6) -> Seq Scan on messages (cost=0.42..9806323789.43 rows=352231 width=6) Filter: (NOT (SubPlan 1)) SubPlan 1 -> Materialize (cost=0.42..27142.52 rows=279366 width=4) -> Unique (cost=0.42..24653.69 rows=279366 width=4) -> Index Only Scan using index_logs_on_message_id on logs (cost=0.42..22646.65 rows=802815 width=4) (7 rows)
There's this blog post, which outlines a possible alternative. Another one would be to use foreign keys together with "on delete cascade".
Updated by Dominic Cleal over 9 years ago
It's probably best if you can file a new issue so we can identify the cause there Dirk. Is it definitely that query that's taking the time? EXPLAIN ANALYZE
might work better to check.
There are a few, and a different one was identified recently in #10228 where a missing index was slowing down MySQL (at least).
Updated by Dirk Heinrichs over 9 years ago
Yes, it is that query. Found out by running
SELECT * FROM pg_stat_activity;
while connected to the foreman database. And no, I can't run EXPLAIN ANALYZE, as it actually executes the query, which would get me in trouble again.
Reg. #10228: Wonder how an index on source_id would help accelerate a query on message_id.
Anyway, created #10329 as requested.