Project

General

Profile

Actions

Bug #26726

closed

opening 'audits' page results in very heavy postgres query

Added by Maarten Plugge almost 5 years ago. Updated about 4 years ago.

Status:
Feedback
Priority:
Normal
Assignee:
-
Category:
Database
Target version:
-
Difficulty:
Triaged:
No
Fixed in Releases:
Found in Releases:

Description

We're running a katello install with about 400 nodes connected to it, and will be adding a lot more nodes in the nearby future. Our installation was finished around May 2018 and we have, up to now, never performed any cleanup actions for our auditing. Over time, especially the past few months, we've seen an decrease in performance on our single katello box. Originally we suspected the virtual machine (8 core, 24GiB RAM) to suffer from the amount of nodes added to it. Later we were eyeing our SAN since we we're seeing a constant iowait of ~30%.

After enabling slow-query logging in postgres, we noticed a lot of queries involving the tasks table took a long time:

2019-04-25 11:08:38 CEST LOG:  duration: 54559.560 ms  execute <unnamed>: SELECT  1 AS one FROM "foreman_tasks_locks" INNER JOIN "foreman_tasks_tasks" ON "foreman_tasks_tasks"."id" = "foreman_tasks_locks"."task_id" WHERE (foreman_tasks_tasks.state != 'stopped') AND ("foreman_tasks_locks"."task_
id" NOT IN ('90103f79-8379-45e4-9d9f-9942faf14288')) AND "foreman_tasks_locks"."name" = $1 AND "foreman_tasks_locks"."resource_id" = $2 AND "foreman_tasks_locks"."resource_type" = $3 LIMIT $4

After some digging around we cleaned that database (we went from 36GiB to 19GiB after that), but still saw poor perfomance. It was only after running a "foreman-rake audits:expire" (deleted 9185150 entries) that we saw a huge performance boost again. However, we still have issues opening the Monitoring > Audits link from the dashboard. After clicking said link, the server becomes unresponsive and the page stops loading. From the query logging I see a few queries being done that stand out to me:

2019-05-02 10:03:27 CEST LOG:  duration: 23431.549 ms  execute <unnamed>: SELECT "audits"."id" FROM "audits" 
2019-05-02 10:03:40 CEST LOG:  duration: 1439.171 ms  bind <unnamed>: SELECT  "audits".* FROM "audits" WHERE (audits.id IN 
< undoable long list of ints here, my pc doesn't even cope with how long this list is >

2019-05-02 10:04:22 CEST LOG:  duration: 1396.459 ms  bind <unnamed>: SELECT COUNT(*) FROM "audits" WHERE (audits.id IN (9185239,9185240,9185241,9185242,9185243,9185244,9185245,9185246,9185247,<another insane list of audit ids>

To get an idea on how many ID's were actually used in that query, this should give a rough estimate (less/vim/Atom all crash when trying to work with the log, so I had to be creative here):

tail -4 /var/lib/pgsql/data/pg_log/postgresql-Thu.log | head -1 | tr ',' '\n' | wc -l
1527386

I think limiting the amount of ID's used in that query (pagination? I don't know) is the culprit for the audits page failing. Besides that I suggest adding a periodic audits table cleanup by default, maybe even a setting that specifies the amount of days audits will be kept.

Actions #1

Updated by Maarten Plugge almost 5 years ago

Addendum: We're running Katello 3.8 (Foreman 1.19.0-1) on Centos 7..

Actions #2

Updated by Jonathon Turel almost 5 years ago

  • Project changed from Katello to Foreman
  • Category changed from 81 to Database

Moving to Foreman as that's where the slow report lives. Thanks for filing the issue!

Actions #3

Updated by Ohad Levy about 4 years ago

  • Status changed from New to Feedback

The audits page has recently rewritten, any chance you can validate if this is still an issue?

Actions

Also available in: Atom PDF