Project

General

Profile

Bug #10133

Massive db deadlocks in postgres from hosts_counter updates with counter_cache_fix.rb

Added by Chuck Schweizer over 3 years ago. Updated about 1 month ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Database
Target version:
Difficulty:
Triaged:
Bugzilla link:
Team Backlog:
Fixed in Releases:
Found in Releases:

Description

https://gist.github.com/csschwe/4cc4d9be58e1cb96ec6c

After updating the Foreman 1.8 rc3 I am seeing a massive amount of DB Deadlocks. This issue was not present in 1.7


Related issues

Related to Foreman - Bug #5692: Puppet environment counters not updatedClosed2014-05-13
Related to Foreman - Bug #12241: Counter cache update didn't pick up changes from after_commit callbackClosed2015-10-21
Related to Foreman - Bug #7246: Remove counter workaround for #5692 on upgrade to rails 4.xRejected2014-08-25
Has duplicate Chef - Bug #11232: Occassional error in tasks when importing facts from foreman-chefDuplicate2015-07-28
Has duplicate Foreman - Bug #5990: multiple calls to create or update domain throws deadlock errorDuplicate2014-05-29

Associated revisions

Revision 7fad1fa0 (diff)
Added by Tomer Brisker about 3 years ago

Fixes #10133 - Prevent deadlocks when fixing counter_cache

Revision a1e75a0b (diff)
Added by Tomer Brisker almost 3 years ago

Fixes #10133 - Prevent deadlocks when fixing counter_cache

(cherry picked from commit 7fad1fa0e253e793511df1cde24d8b1885d640c4)

History

#1 Updated by Chuck Schweizer over 3 years ago

This is in a 40K node environment.

#3 Updated by Tomer Brisker over 3 years ago

  • Related to Bug #5692: Puppet environment counters not updated added

#4 Updated by Tomer Brisker over 3 years ago

  • Category set to Database

Which PostgreSQL version are you using?
This sounds like it might be related to a problem that was fixed in 9.3: http://mina.naguib.ca/blog/2010/11/22/postgresql-foreign-key-deadlocks.html

#5 Updated by Lukas Zapletal over 3 years ago

There are three users in the comments complaining that 9.3 version is even worse and it was not fixed for them :-(

Alvaro Herrera describes the solution in introducing new keyword SELECT ... FOR KEY. That would mean you need both new PostgreSQL 9.3 and newer Rails which takes advantage of that approach? Or some change in Foreman would be required I assume.

#6 Updated by Tomer Brisker over 3 years ago

This specific deadlock should be prevented when we upgrade to Rails 4, as it is caused by a workaround for a bug in cached counters that existed only in Rails 3

#7 Updated by Lukas Zapletal over 3 years ago

Oh I see. Maybe to make this workaround optional so users with heavy load can turn it off?

#8 Updated by Tomer Brisker over 3 years ago

  • Status changed from New to Assigned
  • Assignee set to Tomer Brisker

The counter_cache fix was already in 1.7, so I'm trying to understand what caused this.
Chuck, what operation causes the deadlocks? Did you upgrade anything other then the foreman?

#9 Updated by Tomer Brisker over 3 years ago

Digging into the log it would seem the deadlock is caused by a race between the counter_cache_fix and rails' original update_counters trying to update the same counter at the same time. Will continue investigating.

#10 Updated by Chuck Schweizer over 3 years ago

My environment is a fully updated RHEL 6 install using the foreman installer.

foreman 1.8 rc3
postgres 8.4

The foreman server is only setup to receive reports and facts from the puppet masters, it is not acting as a puppet server or external node configurator.

From what I can tell the uploading of reports and facts from the 40K nodes, through the puppet masters, is causing the deadlocks. Commenting out the logic that updates the DB in counter_cache_fix.rb made the deadlocks stop.

The only thing that was change going from foreman 1.7.1 to 1.8 rc3 was foreman. Nothing else on the system was updated or changed. The foreman install was run after installing the 1.8 rc3 rpms.

#11 Updated by The Foreman Bot over 3 years ago

  • Status changed from Assigned to Ready For Testing
  • Pull request https://github.com/theforeman/foreman/pull/2362 added
  • Pull request deleted ()

#12 Updated by Daniel Lobato Garcia about 3 years ago

Hi Chuck,

Tomer has prepared a proposed fix for this issue - https://github.com/theforeman/foreman/pull/2362
Could you report if it works for your case?

Thanks!

#13 Updated by Chuck Schweizer about 3 years ago

After reducing the number of puppet masters in my environment I have been unable to reproduce the issue.

Daniel Lobato Garcia wrote:

Hi Chuck,

Tomer has prepared a proposed fix for this issue - https://github.com/theforeman/foreman/pull/2362
Could you report if it works for your case?

Thanks!

#14 Updated by Dominic Cleal about 3 years ago

  • Status changed from Ready For Testing to New
  • Assignee deleted (Tomer Brisker)

If anybody reproduces this, we'll retry the patch.

#15 Updated by Tomer Brisker about 3 years ago

  • Has duplicate Bug #11232: Occassional error in tasks when importing facts from foreman-chef added

#16 Updated by The Foreman Bot about 3 years ago

  • Status changed from New to Ready For Testing

#17 Updated by Marek Hulán about 3 years ago

  • Assignee set to Tomer Brisker
  • Legacy Backlogs Release (now unused) set to 72

#18 Updated by Anonymous about 3 years ago

  • Status changed from Ready For Testing to Closed
  • % Done changed from 0 to 100

#19 Updated by Tomer Brisker almost 3 years ago

  • Has duplicate Bug #5990: multiple calls to create or update domain throws deadlock error added

#20 Updated by Tomer Brisker almost 3 years ago

  • Related to Bug #12241: Counter cache update didn't pick up changes from after_commit callback added

#21 Updated by Tomer Brisker about 2 years ago

  • Related to Bug #7246: Remove counter workaround for #5692 on upgrade to rails 4.x added

Also available in: Atom PDF