Bug #15283
closedSlow response times due to Foreman locking query
Description
Cloned from https://bugzilla.redhat.com/show_bug.cgi?id=1334924
Using Satellite 6.1.7, we experienced extremely slow UI and API performance (response times measured in the minutes, tons of ancillary failures due to lock contention). On the server, we hit the Passenger process limit on Foreman instances which were each tied to a Postgres query using 100% CPU continuously for tens of minutes.
We added memory and reduced concurrency to avoid swapping and increased Postgres's shared_buffers & work_mem but the queries were still taking minutes to execute.
The long running queries were all the same Foreman lock check, something of this form:
SELECT COUNT FROM "foreman_tasks_locks" INNER JOIN "foreman_tasks_tasks" ON "foreman_tasks_tasks"."id" = "foreman_tasks_locks"."task_id" WHERE "foreman_tasks_locks"."name" = 'read' AND "foreman_tasks_locks"."resource_id" = 529 AND "foreman_tasks_locks"."resource_type" = 'Katello::System' AND (foreman_tasks_tasks.state != 'stopped') AND ("foreman_tasks_locks"."task_id" NOT IN ('ab80a792-a95e-455a-a0cf-23801e84eba2'));
A quick EXPLAIN showed that triggers a table scan:
foreman=# EXPLAIN ANALYZE SELECT COUNT FROM "foreman_tasks_locks" INNER JOIN "foreman_tasks_tasks" ON "foreman_tasks_tasks"."id" = "foreman_tasks_locks"."task_id" WHERE "foreman_tasks_locks"."name" = 'read' AND "foreman_tasks_locks"."resource_id" = 529 AND "foreman_tasks_locks"."resource_type" = 'Katello::System' AND (foreman_tasks_tasks.state != 'stopped') AND ("foreman_tasks_locks"."task_id" NOT IN ('ab80a792-a95e-455a-a0cf-23801e84eba2'));
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=16218.46..16218.47 rows=1 width=0) (actual time=115714.130..115714.131 rows=1 loops=1)> Nested Loop (cost=0.00..16218.46 rows=1 width=0) (actual time=7013.001..115714.110 rows=2 loops=1)> Index Scan using index_foreman_tasks_locks_on_resource_type_and_resource_id on foreman_tasks_locks (cost=0.00..23.13 rows=1 width=37) (actual time=0.111..16.142 rows=1615 loops=1)
Join Filter: ((foreman_tasks_locks.task_id)::text = (foreman_tasks_tasks.id)::text)
Rows Removed by Join Filter: 3029738
Index Cond: (((resource_type)::text = 'Katello::System'::text) AND (resource_id = 529))
Filter: (((task_id)::text <> 'ab80a792-a95e-455a-a0cf-23801e84eba2'::text) AND ((name)::text = 'read'::text))
Rows Removed by Filter: 1615
-> Seq Scan on foreman_tasks_tasks (cost=0.00..16171.96 rows=1869 width=37) (actual time=0.020..71.448 rows=1876 loops=1615)
Filter: ((state)::text <> 'stopped'::text)
Rows Removed by Filter: 371338
Total runtime: 115714.238 ms
(12 rows)
After adding a couple of indexes, the response times are significantly improved:
foreman=# CREATE INDEX ON foreman_tasks_tasks (id, state);
CREATE INDEX
foreman=# CREATE INDEX ON foreman_tasks_locks (name, resource_type, resource_id);
CREATE INDEX
foreman=# EXPLAIN ANALYZE SELECT COUNT FROM "foreman_tasks_locks" INNER JOIN "foreman_tasks_tasks" ON "foreman_tasks_tasks"."id" = "foreman_tasks_locks"."task_id" WHERE "foreman_tasks_locks"."name" = 'read' AND "foreman_tasks_locks"."resource_id" = 529 AND "foreman_tasks_locks"."resource_type" = 'Katello::System' AND (foreman_tasks_tasks.state != 'stopped') AND ("foreman_tasks_locks"."task_id" NOT IN ('ab80a792-a95e-455a-a0cf-23801e84eba2'));
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=69.76..69.77 rows=1 width=0) (actual time=30.810..30.811 rows=1 loops=1)> Nested Loop (cost=0.00..69.75 rows=1 width=0) (actual time=30.787..30.807 rows=2 loops=1)> Index Scan using foreman_tasks_locks_name_resource_type_resource_id_idx on foreman_tasks_locks (cost=0.00..22.91 rows=7 width=37) (actual time=0.149..1.926 rows=1615 loops=1)
Index Cond: (((name)::text = 'read'::text) AND ((resource_type)::text = 'Katello::System'::text) AND (resource_id = 529))
Filter: ((task_id)::text <> 'ab80a792-a95e-455a-a0cf-23801e84eba2'::text)
-> Index Only Scan using foreman_tasks_tasks_id_state_idx on foreman_tasks_tasks (cost=0.00..6.68 rows=1 width=37) (actual time=0.018..0.018 rows=0 loops=1615)
Index Cond: (id = (foreman_tasks_locks.task_id)::text)
Filter: ((state)::text <> 'stopped'::text)
Rows Removed by Filter: 1
Heap Fetches: 1057
Total runtime: 30.873 ms
(11 rows)
(As an aside, I don't know whether ~300k tasks is considered excessive but https://bugzilla.redhat.com/show_bug.cgi?id=1289384 makes me glad 6.2 will include foreman_tasks:cleanup)
Updated by Ivan Necas almost 8 years ago
- Project changed from Foreman Remote Execution to foreman-tasks
Updated by Ivan Necas over 7 years ago
- Target version set to 127
- Status changed from New to Assigned
Updated by The Foreman Bot over 7 years ago
- Status changed from Assigned to Ready For Testing
- Assignee set to Ivan Necas
- Pull request https://github.com/theforeman/foreman-tasks/pull/202 added
Updated by Ivan Necas over 7 years ago
- Status changed from Ready For Testing to Closed
Updated by Shlomi Zadok over 7 years ago
- Has duplicate Bug #16584: foreman_tasks_tasks table needs index on task state added
Updated by The Foreman Bot over 7 years ago
- Pull request https://github.com/theforeman/foreman-tasks/pull/205 added