Bug #8565: report::expire is running very slowly - Foreman

Custom queries

3.11.5 release DONE
3.11.5 release TODO
3.12.2 release DONE
3.12.2 release TODO
3.13.0 release DONE
3.13.0 release TODO
All bugs by votes
All issues closed in the last 3 weeks
Easy and Trivial Issues
Easy Issues (only)
Foreman - new open issues in last 10 days
Good first issues - Foreman core
high prio open bugs
Host registration [open]
Mine
My issues closed in the last 3 weeks
old bugs
Open with target version set (aka blockers)
registration
taken-none
Team Atlas untriaged bugs
templates
Trivial Issues
Untriaged and opened in the past two weeks

Actions

Copy link

Bug #8565

closed

report::expire is running very slowly

Added by Chuck Schweizer about 10 years ago. Updated over 6 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Dominic Cleal

Category:

Reporting

Target version:

1.7.3

Difficulty:

Triaged:

Bugzilla link:

Pull request:

https://github.com/theforeman/foreman/pull/2008

Fixed in Releases:

Found in Releases:

1.7.0

Red Hat JIRA:

Description

The fix included in #1592 has made report::expire very slow

Expire 5 day old reports (runs every 30 minutes)
Pre fix: 2.5 minutes
Post fix: 21 minutes

Expire 1 day old non eventful reports (runs every 30 minutes)
Pre fix: 2.5 minutes
Post fix: 21 minutes

Most of the time seems to be spent in Postgres deleting the records in the post fix report.rb. I will be working on getting more data added to this ticket.

Related issues 3 (1 open — 2 closed)

Related to Foreman - Bug #1592: report::expire errors with millions of records	Closed	Ori Rabin	04/30/2012	Actions
Related to Foreman - Bug #8316: Removing a long-standing host creates huge transaction	Closed	Daniel Lobato Garcia	11/07/2014	Actions
Related to Foreman - Refactor #8798: Reports.expire uses an unnecessary join on logs	New		12/28/2014	Actions

Issue # Delay: days Cancel

History
Notes
Property changes
Associated revisions

Actions

Copy link

Updated by Chuck Schweizer about 10 years ago

4,000 nodes running puppet once per hour.

Actions

Copy link

Updated by Dominic Cleal about 10 years ago

Related to Bug #1592: report::expire errors with millions of records added

Actions

Copy link

Updated by Dominic Cleal about 10 years ago

Category set to Reporting
Translation missing: en.field_release set to 29

Actions

Copy link

Updated by Dominic Cleal about 10 years ago

The most useful info would be debug logs while running the rake task: http://projects.theforeman.org/projects/foreman/wiki/Troubleshooting#How-do-I-enable-debugging

The SQL logs would then show the time taken for each query, and possibly even EXPLAIN data if they appear to take long enough. If not, then we could try re-running some queries with EXPLAIN.

Actions

Copy link

Updated by Dominic Cleal about 10 years ago

Related to Bug #8316: Removing a long-standing host creates huge transaction added

Actions

Copy link

Updated by Dominic Cleal about 10 years ago

Translation missing: en.field_release deleted (29)

Actions

Copy link

Updated by Chuck Schweizer about 10 years ago

Here is where the report is being slow.

 foreman | foreman  |   11783 |             | f       | 2014-12-10 13:52:56.117167+00 | DELETE FROM "messages" WHERE (id not IN (SELECT message_id FROM "logs" ))

 foreman=# explain DELETE FROM "messages" WHERE (id not IN (SELECT message_id FROM "logs" ));
                                 QUERY PLAN                                 
----------------------------------------------------------------------------
 Seq Scan on messages  (cost=38070.64..323418103.91 rows=12289 width=6)
   Filter: (NOT (SubPlan 1))
   SubPlan 1
     ->  Materialize  (cost=38070.64..60375.40 rows=1603876 width=4)
           ->  Seq Scan on logs  (cost=0.00..30200.76 rows=1603876 width=4)
(5 rows)

Actions

Copy link

Updated by Chuck Schweizer about 10 years ago

Explain with DISTINCT

foreman=# explain DELETE FROM "messages" WHERE (id not IN (SELECT DISTINCT message_id from "logs" ));
                                 QUERY PLAN                                 
----------------------------------------------------------------------------
 Seq Scan on messages  (cost=34343.70..36100.92 rows=12289 width=6)
   Filter: (NOT (hashed SubPlan 1))
   SubPlan 1
     ->  HashAggregate  (cost=34314.32..34337.82 rows=2350 width=4)
           ->  Seq Scan on logs  (cost=0.00..30292.46 rows=1608746 width=4)
(5 rows)

Actions

Copy link

Updated by Dominic Cleal about 10 years ago

Status changed from New to Assigned
Assignee set to Dominic Cleal

Actions

Copy link

#10

Updated by The Foreman Bot about 10 years ago

Status changed from Assigned to Ready For Testing
Pull request https://github.com/theforeman/foreman/pull/2008 added
Pull request deleted ()

Actions

Copy link

#11

Updated by Chuck Schweizer about 10 years ago

Much better with your fix. Down to 12 seconds.

foreman=# explain analyze DELETE FROM "messages" WHERE (id not IN (SELECT DISTINCT message_id from "logs" ));
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on messages  (cost=34423.41..36180.64 rows=12289 width=6) (actual time=699.123..710.044 rows=19 loops=1)
   Filter: (NOT (hashed SubPlan 1))
   SubPlan 1
     ->  HashAggregate  (cost=34394.04..34417.54 rows=2350 width=4) (actual time=670.766..680.408 rows=28760 loops=1)
           ->  Seq Scan on logs  (cost=0.00..30362.83 rows=1612483 width=4) (actual time=0.055..312.759 rows=1613655 loops=1)
 Total runtime: 710.908 ms
(6 rows)

$ time ./foreman-rake reports:expire days=5 RAILS_ENV='production'

real    0m12.144s
user    0m8.497s
sys    0m0.924s

Actions

Copy link

#12

Updated by Dominic Cleal almost 10 years ago

Related to Refactor #8798: Reports.expire uses an unnecessary join on logs added

Actions

Copy link

#13

Updated by Martin Jackson almost 10 years ago

Applying the patch attached to the PR gets our report expiry for ~750,000 reports into the 30 second range.

Actions

Copy link

#14

Updated by Dominic Cleal almost 10 years ago

Status changed from Ready For Testing to Closed
% Done changed from 0 to 100

Applied in changeset 9fde85ad830031d38a6786d30a48519344dad446.

Actions

Copy link

#15

Updated by Dominic Cleal almost 10 years ago

Translation missing: en.field_release set to 32

Actions

Copy link

#16

Updated by Dirk Heinrichs over 9 years ago

Running 1.7.4, expiring reports still is quite slow (PostgreSQL 9.3.x). Runs longer than a day, so that we run into situations where multiple expire jobs are running, hogging the CPU to 100%. Plan looks like this:

foreman=> select count(*) from reports;
 count
-------
  1951
(1 row)

foreman=> select count(*) from messages;
 count
--------
 704462
(1 row)

foreman=> select count(*) from logs;
 count
--------
 802815
(1 row)

foreman=> explain DELETE FROM "messages" WHERE (id not IN (SELECT DISTINCT message_id FROM "logs" ));
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Delete on messages  (cost=0.42..9806323789.43 rows=352231 width=6)
   ->  Seq Scan on messages  (cost=0.42..9806323789.43 rows=352231 width=6)
         Filter: (NOT (SubPlan 1))
         SubPlan 1
           ->  Materialize  (cost=0.42..27142.52 rows=279366 width=4)
                 ->  Unique  (cost=0.42..24653.69 rows=279366 width=4)
                       ->  Index Only Scan using index_logs_on_message_id on logs  (cost=0.42..22646.65 rows=802815 width=4)
(7 rows)

There's this blog post, which outlines a possible alternative. Another one would be to use foreign keys together with "on delete cascade".

Actions

Copy link

#17

Updated by Dominic Cleal over 9 years ago

It's probably best if you can file a new issue so we can identify the cause there Dirk. Is it definitely that query that's taking the time? EXPLAIN ANALYZE might work better to check.

There are a few, and a different one was identified recently in #10228 where a missing index was slowing down MySQL (at least).

Actions

Copy link

#18

Updated by Dirk Heinrichs over 9 years ago

Yes, it is that query. Found out by running

SELECT * FROM pg_stat_activity;

while connected to the foreman database. And no, I can't run EXPLAIN ANALYZE, as it actually executes the query, which would get me in trouble again.

Reg. #10228: Wonder how an index on source_id would help accelerate a query on message_id.

Anyway, created #10329 as requested.

Actions

Copy link

Also available in: Atom PDF

Project

General

Profile

Custom queries

Bug #8565

report::expire is running very slowly

Updated by Chuck Schweizer about 10 years ago

Updated by Dominic Cleal about 10 years ago

Updated by Dominic Cleal about 10 years ago

Updated by Dominic Cleal about 10 years ago

Updated by Dominic Cleal about 10 years ago

Updated by Dominic Cleal about 10 years ago

Updated by Chuck Schweizer about 10 years ago

Updated by Chuck Schweizer about 10 years ago

Updated by Dominic Cleal about 10 years ago

Updated by The Foreman Bot about 10 years ago

Updated by Chuck Schweizer about 10 years ago

Updated by Dominic Cleal almost 10 years ago

Updated by Martin Jackson almost 10 years ago

Updated by Dominic Cleal almost 10 years ago

Updated by Dominic Cleal almost 10 years ago

Updated by Dirk Heinrichs over 9 years ago

Updated by Dominic Cleal over 9 years ago

Updated by Dirk Heinrichs over 9 years ago