Project

General

Profile

Actions

Bug #15283

closed

Slow response times due to Foreman locking query

Added by Bryan Kearney almost 8 years ago. Updated almost 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Fixed in Releases:
Found in Releases:

Description

Cloned from https://bugzilla.redhat.com/show_bug.cgi?id=1334924
Using Satellite 6.1.7, we experienced extremely slow UI and API performance (response times measured in the minutes, tons of ancillary failures due to lock contention). On the server, we hit the Passenger process limit on Foreman instances which were each tied to a Postgres query using 100% CPU continuously for tens of minutes.

We added memory and reduced concurrency to avoid swapping and increased Postgres's shared_buffers & work_mem but the queries were still taking minutes to execute.

The long running queries were all the same Foreman lock check, something of this form:

SELECT COUNT FROM "foreman_tasks_locks" INNER JOIN "foreman_tasks_tasks" ON "foreman_tasks_tasks"."id" = "foreman_tasks_locks"."task_id" WHERE "foreman_tasks_locks"."name" = 'read' AND "foreman_tasks_locks"."resource_id" = 529 AND "foreman_tasks_locks"."resource_type" = 'Katello::System' AND (foreman_tasks_tasks.state != 'stopped') AND ("foreman_tasks_locks"."task_id" NOT IN ('ab80a792-a95e-455a-a0cf-23801e84eba2'));

A quick EXPLAIN showed that triggers a table scan:

foreman=# EXPLAIN ANALYZE SELECT COUNT FROM "foreman_tasks_locks" INNER JOIN "foreman_tasks_tasks" ON "foreman_tasks_tasks"."id" = "foreman_tasks_locks"."task_id" WHERE "foreman_tasks_locks"."name" = 'read' AND "foreman_tasks_locks"."resource_id" = 529 AND "foreman_tasks_locks"."resource_type" = 'Katello::System' AND (foreman_tasks_tasks.state != 'stopped') AND ("foreman_tasks_locks"."task_id" NOT IN ('ab80a792-a95e-455a-a0cf-23801e84eba2'));
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=16218.46..16218.47 rows=1 width=0) (actual time=115714.130..115714.131 rows=1 loops=1)
> Nested Loop (cost=0.00..16218.46 rows=1 width=0) (actual time=7013.001..115714.110 rows=2 loops=1)
Join Filter: ((foreman_tasks_locks.task_id)::text = (foreman_tasks_tasks.id)::text)
Rows Removed by Join Filter: 3029738
> Index Scan using index_foreman_tasks_locks_on_resource_type_and_resource_id on foreman_tasks_locks (cost=0.00..23.13 rows=1 width=37) (actual time=0.111..16.142 rows=1615 loops=1)
Index Cond: (((resource_type)::text = 'Katello::System'::text) AND (resource_id = 529))
Filter: (((task_id)::text <> 'ab80a792-a95e-455a-a0cf-23801e84eba2'::text) AND ((name)::text = 'read'::text))
Rows Removed by Filter: 1615
-> Seq Scan on foreman_tasks_tasks (cost=0.00..16171.96 rows=1869 width=37) (actual time=0.020..71.448 rows=1876 loops=1615)
Filter: ((state)::text <> 'stopped'::text)
Rows Removed by Filter: 371338
Total runtime: 115714.238 ms
(12 rows)

After adding a couple of indexes, the response times are significantly improved:

foreman=# CREATE INDEX ON foreman_tasks_tasks (id, state);
CREATE INDEX
foreman=# CREATE INDEX ON foreman_tasks_locks (name, resource_type, resource_id);
CREATE INDEX
foreman=# EXPLAIN ANALYZE SELECT COUNT FROM "foreman_tasks_locks" INNER JOIN "foreman_tasks_tasks" ON "foreman_tasks_tasks"."id" = "foreman_tasks_locks"."task_id" WHERE "foreman_tasks_locks"."name" = 'read' AND "foreman_tasks_locks"."resource_id" = 529 AND "foreman_tasks_locks"."resource_type" = 'Katello::System' AND (foreman_tasks_tasks.state != 'stopped') AND ("foreman_tasks_locks"."task_id" NOT IN ('ab80a792-a95e-455a-a0cf-23801e84eba2'));
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=69.76..69.77 rows=1 width=0) (actual time=30.810..30.811 rows=1 loops=1)
> Nested Loop (cost=0.00..69.75 rows=1 width=0) (actual time=30.787..30.807 rows=2 loops=1)
> Index Scan using foreman_tasks_locks_name_resource_type_resource_id_idx on foreman_tasks_locks (cost=0.00..22.91 rows=7 width=37) (actual time=0.149..1.926 rows=1615 loops=1)
Index Cond: (((name)::text = 'read'::text) AND ((resource_type)::text = 'Katello::System'::text) AND (resource_id = 529))
Filter: ((task_id)::text <> 'ab80a792-a95e-455a-a0cf-23801e84eba2'::text)
-> Index Only Scan using foreman_tasks_tasks_id_state_idx on foreman_tasks_tasks (cost=0.00..6.68 rows=1 width=37) (actual time=0.018..0.018 rows=0 loops=1615)
Index Cond: (id = (foreman_tasks_locks.task_id)::text)
Filter: ((state)::text <> 'stopped'::text)
Rows Removed by Filter: 1
Heap Fetches: 1057
Total runtime: 30.873 ms
(11 rows)

(As an aside, I don't know whether ~300k tasks is considered excessive but https://bugzilla.redhat.com/show_bug.cgi?id=1289384 makes me glad 6.2 will include foreman_tasks:cleanup)


Related issues 1 (0 open1 closed)

Has duplicate foreman-tasks - Bug #16584: foreman_tasks_tasks table needs index on task stateDuplicate09/16/2016Actions
Actions #1

Updated by Ivan Necas almost 8 years ago

  • Project changed from Foreman Remote Execution to foreman-tasks
Actions #2

Updated by Ivan Necas over 7 years ago

  • Target version set to 127
  • Status changed from New to Assigned
Actions #3

Updated by The Foreman Bot over 7 years ago

  • Status changed from Assigned to Ready For Testing
  • Assignee set to Ivan Necas
  • Pull request https://github.com/theforeman/foreman-tasks/pull/202 added
Actions #4

Updated by Ivan Necas over 7 years ago

  • Status changed from Ready For Testing to Closed
Actions #5

Updated by Shlomi Zadok over 7 years ago

  • Has duplicate Bug #16584: foreman_tasks_tasks table needs index on task state added
Actions #6

Updated by The Foreman Bot over 7 years ago

  • Pull request https://github.com/theforeman/foreman-tasks/pull/205 added
Actions

Also available in: Atom PDF