Bug #12193
closedDeadlock occuring when creating host
Description
The following happens quite often when trying to insert to nics table when creating a new host:
2015-10-15 11:55:40 [app] [W] Action failed | ActiveRecord::StatementInvalid: PGError: ERROR: deadlock detected | DETAIL: Process 15662 waits for ExclusiveLock on tuple (8,16) of relation 16 559 of database 16384; blocked by process 20700. | Process 20700 waits for ShareLock on transaction 299626203; blocked by proces s 15662. | HINT: See server log for query details. | : UPDATE "domains" SET "hosts_count" = COALESCE("hosts_count", 0) + 1 WHERE " domains"."id" IN (SELECT "domains"."id" FROM "domains" WHERE "domains"."id" = 2 ORDER BY domains.name) | /opt/rh/ruby193/root/usr/share/gems/gems/activerecord-3.2.8/lib/active_record /connection_adapters/postgresql_adapter.rb:1158:in `async_exec' .. /associations/builder/belongs_to.rb:30:in `block in add_counter_cache_callbacks' | /usr/share/foreman/app/models/nic/base.rb:61:in `belongs_to_counter_cache_after_create_for_domain' ... | /usr/share/foreman/app/models/concerns/foreman/sti.rb:29:in `save_with_type' | /usr/share/foreman/app/controllers/hosts_controller.rb:84:in `create' ... | /usr/share/foreman/app/controllers/concerns/application_shared.rb:13:in `set_ timezone'
[2015-10-14 07:43:19.739 CEST] [12290]: LOG: process 12290 still waiting for ShareLock on transaction 299488065 after 1000.100 ms [2015-10-14 07:43:19.739 CEST] [12290]: CONTEXT: SQL statement "SELECT 1 FROM ONLY "public"."operatingsystems" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR SHARE OF x" ... [2015-10-14 07:46:06.028 CEST] [10305]: LOG: process 10305 still waiting for ShareLock on transaction 299488065 after 1000.123 ms [2015-10-14 07:46:06.028 CEST] [10305]: STATEMENT: UPDATE "domains" SET "hosts_count" = COALESCE("hosts_count", 0) - 1 WHERE "domains"."id" IN (SELECT "domains"."id" FROM "domains" WHERE "domains"."id" = 2 ORDER BY domains.name) ... [2015-10-14 07:50:21.695 CEST] [12248]: DETAIL: Process 12248 waits for ShareLock on transaction 299488070; blocked by process 10305. Process 10305 waits for ShareLock on transaction 299488101; blocked by process 12248. Process 12248: INSERT INTO "nics" ("attached_devices", "attached_to", "attrs", "bond_options", "compute_attributes", "created_at", "domain_id", "host_id", "identifier", "ip", "link", "mac", "managed", "mode", "name", "password", "primary", "provider", "provision", "subnet_id", "tag", "type", "updated_at", "username", "virtual") VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25) RETURNING "id" Process 10305: UPDATE "architectures" SET "hosts_count" = COALESCE("hosts_count", 0) - 1 WHERE "architectures"."id" = 1 [2015-10-14 07:50:21.695 CEST] [12248]: HINT: See server log for query details. [2015-10-14 07:50:21.695 CEST] [12248]: CONTEXT: SQL statement "SELECT 1 FROM ONLY "public"."domains" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR SHARE OF x" [2015-10-14 07:50:21.695 CEST] [12248]: STATEMENT: INSERT INTO "nics" ("attached_devices", "attached_to", "attrs", "bond_options", "compute_attributes", "created_at", "domain_id", "host_id", "identifier", "ip", "link", "mac", "managed", "mode", "name", "password", "primary", "provider", "provision", "subnet_id", "tag", "type", "updated_at", "username", "virtual") VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25) RETURNING "id" [2015-10-14 07:50:22.640 CEST] [12256]: LOG: process 12256 still waiting for ShareLock on transaction 299488094
We are on postgresql version 8.4.20.
Tomer Brisker told me this:
Looks like ufortunatley rails doesn't allow an after_commit callback, which would update in a different transaction and prevent deadlocks
I applied the patch in http://projects.theforeman.org/issues/5990 but it did not help. We were previously on foreman 1.6.3 and there we did not see any share locks and hanging queries against the database.
Updated by Stefan Julin about 9 years ago
Another example when trying to create two hosts at the same time:
[2015-10-16 13:05:21.931 CEST] [26005]: LOG: process 26005 still waiting for ShareLock on transaction 299685807 after 1000.079 ms [2015-10-16 13:05:21.931 CEST] [26005]: CONTEXT: SQL statement "SELECT 1 FROM ONLY "public"."architectures" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR SHARE OF x" [2015-10-16 13:05:21.931 CEST] [26005]: STATEMENT: INSERT INTO "hosts" ("architecture_id", "build", "certname", "comment", "compute_profile_id", "compute_resource_id", "created_at", "disk", "enabled", "environment_id", "grub_pass", "hostgroup_id", "image_file", "image_id", "installed_at", "last_compile", "last_report", "location_id", "managed", "medium_id", "model_id", "name", "operatingsystem_id", "organization_id", "otp", "owner_id", "owner_type", "provision_method", "ptable_id", "puppet_ca_proxy_id", "puppet_proxy_id", "puppet_status", "realm_id", "root_pass", "type", "updated_at", "use_image", "uuid") VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, $29, $30, $31, $32, $33, $34, $35, $36, $37, $38) RETURNING "id" [2015-10-16 13:07:03.397 CEST] [26005]: LOG: process 26005 acquired ShareLock on transaction 299685807 after 102465.235 ms [2015-10-16 13:07:03.397 CEST] [26005]: CONTEXT: SQL statement "SELECT 1 FROM ONLY "public"."architectures" x WHERE "id" OPERATOR(pg_catalog.=) $1 FOR SHARE OF x" [2015-10-16 13:07:03.397 CEST] [26005]: STATEMENT: INSERT INTO "hosts" ("architecture_id", "build", "certname", "comment", "compute_profile_id", "compute_resource_id", "created_at", "disk", "enabled", "environment_id", "grub_pass", "hostgroup_id", "image_file", "image_id", "installed_at", "last_compile", "last_report", "location_id", "managed", "medium_id", "model_id", "name", "operatingsystem_id", "organization_id", "otp", "owner_id", "owner_type", "provision_method", "ptable_id", "puppet_ca_proxy_id", "puppet_proxy_id", "puppet_status", "realm_id", "root_pass", "type", "updated_at", "use_image", "uuid") VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, $29, $30, $31, $32, $33, $34, $35, $36, $37, $38) RETURNING "id" [2015-10-16 13:07:03.398 CEST] [26005]: LOG: duration: 102467.738 ms execute a12: INSERT INTO "hosts" ("architecture_id", "build", "certname", "comment", "compute_profile_id", "compute_resource_id", "created_at", "disk", "enabled", "environment_id", "grub_pass", "hostgroup_id", "image_file", "image_id", "installed_at", "last_compile", "last_report", "location_id", "managed", "medium_id", "model_id", "name", "operatingsystem_id", "organization_id", "otp", "owner_id", "owner_type", "provision_method", "ptable_id", "puppet_ca_proxy_id", "puppet_proxy_id", "puppet_status", "realm_id", "root_pass", "type", "updated_at", "use_image", "uuid") VALUES ($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25, $26, $27, $28, $29, $30, $31, $32, $33, $34, $35, $36, $37, $38) RETURNING "id"
Updated by Tomer Brisker about 9 years ago
- Related to Bug #11444: Domains page shows negative number of hosts added
Updated by Tomer Brisker about 9 years ago
- Category set to Database
- Priority changed from High to Normal
Deadlocks on domain table should be resolved, or at least reduced, by fix to #11444.
It seems like these deadlocks originate from how rails handles counter_cache, which is somewhat buggy in Rails 3.x (hopefully it will improve with the upgrade to Rails 4.x) in combination with postgresql 8.4 which does a bad job handling locks sometimes (see here for more info).
I'm not sure how much we can improve this on the foreman side other then completely remove cached counters or replace them with a different, concurrence-safe mechanism (such as Counter Culture, which sadly seems abandoned which seems to be maintained sparsly).
Updated by Stefan Julin about 9 years ago
We have now upgraded to postgres 9.4 and get locks related to this (tested with sql query:
SELECT bl.pid AS blocked_pid, a.usename AS blocked_user, ka.query AS blocking_statement, now() - ka.query_start AS blocking_duration, kl.pid AS blocking_pid, ka.usename AS blocking_user, a.query AS blocked_statement, now() - a.query_start AS blocked_duration FROM pg_catalog.pg_locks bl JOIN pg_catalog.pg_stat_activity a ON a.pid = bl.pid JOIN pg_catalog.pg_locks kl ON kl.transactionid = bl.transactionid AND kl.pid != bl.pid JOIN pg_catalog.pg_stat_activity ka ON ka.pid = kl.pid WHERE NOT bl.GRANTED;
):
There are two records in the result:
Query 1:
DELETE FROM "taxable_taxonomies" WHERE "taxable_taxonomies"."id" = $1
is waiting for this query:
UPDATE "domains" SET "hostgroups_count" = COALESCE("hostgroups_count", 0) - 1 WHERE "domains"."id" IN (SELECT "domains"."id" FROM "domains" WHERE "domains"."id" = 2 ORDER BY domains.name)
Query 2:
UPDATE "domains" SET "hostgroups_count" = COALESCE("hostgroups_count", 0) - 1 WHERE "domains"."id" IN (SELECT "domains"."id" FROM "domains" WHERE "domains"."id" = 2 ORDER BY domains.name)
is waiting for this query:
SELECT "hosts".id FROM "hosts" WHERE "hosts"."type" IN ('Host::Managed') AND "hosts"."hostgroup_id" = 4292
So it looks like the :hostgroups_count logic in the Domain class (mapping to table domains) is making queries wait also in postgresql 9.4.
Updated by Chris Duryee over 8 years ago
Would it be ok to just remove the counter cache? I tested a couple of count(*) queries for 5.5K hosts and they were all pretty fast (less than 3msec).
Updated by Tomer Brisker over 8 years ago
- Status changed from New to Assigned
Chris - I have started to work in that direction already, once the code is ready I will run some performance testing against a heavy DB.
Removing the cache will give us the advantage of getting taxonomy scoped and authorized counts, at the price of not being able to sort by count, increasing load times and making searching by counts more difficult.
Updated by Lukas Zapletal over 8 years ago
I share the opinion, one big warning - while it might work just fine on PostgreSQL, the other database MySQL can be the issue. If this turns out to be it, we should go ahead and reopen discussion of dropping MySQL from supported databases.
Updated by The Foreman Bot over 8 years ago
- Status changed from Assigned to Ready For Testing
- Pull request https://github.com/theforeman/foreman/pull/3596 added
Updated by Anonymous over 8 years ago
- Status changed from Ready For Testing to Closed
- % Done changed from 0 to 100
Applied in changeset c773c0ef6be2185fe83252a05c8ca4e54e0608a7.
Updated by Dominic Cleal over 8 years ago
- Translation missing: en.field_release set to 160
Updated by Tomer Brisker about 8 years ago
- Related to Refactor #12484: Remove useless host group counters added
Updated by Tomer Brisker about 8 years ago
- Related to Bug #11741: Config group host counter not incremented on host creation with association added
Updated by Dominic Cleal about 8 years ago
- Related to Bug #16622: ActiveRecord::StatementInvalid exception when accessing Hosts->Operating systems with restricted access added
Updated by Tomer Brisker almost 8 years ago
- Related to Bug #17666: Upgrade fails during db:migration added
Updated by Dominic Cleal almost 8 years ago
- Related to Bug #18260: Choose different org will result in the same org's host number change in organizations page added