Bug #16749

Foreman performs extremely slowly when a puppet agent with a high number of IP addresses configured reports in

Added by Eduardo Mayoral about 1 year ago. Updated about 1 year ago.

Status:Closed
Priority:Normal
Assigned To:Dominic Cleal
Category:Importers
Target version:-
Difficulty:medium Bugzilla link:
Found in release:1.12.3 Pull request:https://github.com/theforeman/foreman/pull/3909
Story points-
Velocity based estimate-
Release1.13.1Release relationshipAuto

Description

I have a foreman installation with about 900 puppet agents connecting to it. A few of them have a high number of local IP adresses (in the range of 300-500)

When one of these agents reports in, processing time for that request is very slow (about 12 seconds, as compared to about 0.4 seconds for a typical agent with 1 or 2 IP addresses), the CPU usage for the foreman server process goes very high (at or around 100%) and the memory usage of the foreman process increases a lot.

This happens with several agents and the common factor seems to be the high number of local IP addresses.

Relevant production.log lines:

2016-09-30 07:33:45 [app] [I] Started POST "/api/hosts/facts" for 217.76.129.21 at 2016-09-30 07:33:45 +0200
2016-09-30 07:33:45 [app] [I] Processing by Api::V2::HostsController#facts as JSON
2016-09-30 07:33:45 [app] [I] Parameters: {"name"=>"dnsp12.servidoresdns.net", "facts"=>"[FILTERED]", "certname"=>"dnsp12.servidoresdns.net", "apiv"=>"v2", :host=>{"name"=>"dnsp12.servidoresdns.net", "certname"=>"dnsp12.servidoresdns.net"}}
2016-09-30 07:34:09 [app] [I] Completed 201 Created in 24556ms (Views: 10.0ms | ActiveRecord: 7663.3ms)
2016-09-30 07:34:09 [app] [I] Started GET "/node/dnsp12.servidoresdns.net?format=yml" for 217.76.129.21 at 2016-09-30 07:34:09 +0200
2016-09-30 07:34:09 [app] [I] Processing by HostsController#externalNodes as YML
2016-09-30 07:34:09 [app] [I] Parameters: {"name"=>"dnsp12.servidoresdns.net"}

production-debug.log.tar.xz (181 KB) Eduardo Mayoral, 10/03/2016 05:39 AM

Associated revisions

Revision 4d0870ad
Added by Dominic Cleal about 1 year ago

fixes #16749 - improve NIC update performance during fact imports

- In get_facts_for_interface, replace regexes in hash key filtering
with simple String methods and simplify hash construction
- Change logger.debug to lazy evaluation for production performance,
especially when calling #inspect

Before:

PuppetFactParser 1      6.114k (±16.1%) i/s -     57.836k in   9.951523s
PuppetFactParser 50 15.127 (±26.4%) i/s - 135.000 in 10.036620s
PuppetFactParser 500 0.143 (± 0.0%) i/s - 2.000 in 14.027935s
Memory stats
Total objects allocated: 61005590
Total heap pages allocated: 4218

After:

PuppetFactParser 1      9.308k (±14.8%) i/s -     87.936k in   9.934396s
PuppetFactParser 50 213.821 (±16.4%) i/s - 2.018k in 10.007988s
PuppetFactParser 500 2.610 (± 0.0%) i/s - 27.000 in 10.357182s
Total objects allocated: 149652784
Total heap pages allocated: 4221

(approximately 15x faster, and sub-second for the ~400 interfaces in the
bug report)

In Host#set_interface, unchanged interfaces are no longer saved to avoid
creating an unnecessary validation and DB transactions in most imports.

Revision d0dad658
Added by Dominic Cleal about 1 year ago

fixes #16749 - improve NIC update performance during fact imports

- In get_facts_for_interface, replace regexes in hash key filtering
with simple String methods and simplify hash construction
- Change logger.debug to lazy evaluation for production performance,
especially when calling #inspect

Before:

PuppetFactParser 1      6.114k (±16.1%) i/s -     57.836k in   9.951523s
PuppetFactParser 50 15.127 (±26.4%) i/s - 135.000 in 10.036620s
PuppetFactParser 500 0.143 (± 0.0%) i/s - 2.000 in 14.027935s
Memory stats
Total objects allocated: 61005590
Total heap pages allocated: 4218

After:

PuppetFactParser 1      9.308k (±14.8%) i/s -     87.936k in   9.934396s
PuppetFactParser 50 213.821 (±16.4%) i/s - 2.018k in 10.007988s
PuppetFactParser 500 2.610 (± 0.0%) i/s - 27.000 in 10.357182s
Total objects allocated: 149652784
Total heap pages allocated: 4221

(approximately 15x faster, and sub-second for the ~400 interfaces in the
bug report)

In Host#set_interface, unchanged interfaces are no longer saved to avoid
creating an unnecessary validation and DB transactions in most imports.

(cherry picked from commit 4d0870ad2503d9e1489aa18ddc4e837675a0b8f6)

History

#1 Updated by Dominic Cleal about 1 year ago

  • Category changed from Puppet integration to Importers

It may help somebody to fix this later if you can enable SQL logging and capture a full import (https://theforeman.org/manuals/1.12/index.html#7.2Debugging).

Note that 1.14.0 will have some improvements to fact imports performance via #9016, but this is only for importing the facts themselves - if the slowdown is from importing the network interfaces from the facts then this is unchanged.

#2 Updated by Eduardo Mayoral about 1 year ago

Attached is log with

:loggers:
:sql:
:enabled: true

and
:logging:
:level: debug

As it is a production system the log is not only for the agent causing the problems. The agent causing the problems is dnsp12, you will recognize it easily in the log by the amount of network interfaces. Look for eth_200

#3 Updated by Dominic Cleal about 1 year ago

Thanks, that's very helpful. It looks like the NIC updates themselves take a long time to complete.

There are two long loops - once retrieving the facts per interface in the parser, and the second when saving each NIC in turn.

2016-10-03 11:33:55 [app] [I] Processing by Api::V2::HostsController#facts as JSON
2016-10-03 11:33:55 [app] [I] Parameters: {"certname"=>"dnsp12.servidoresdns.net", "name"=>"dnsp12.servidoresdns.net", "facts"=>"[FILTERED]", "apiv"=>"v2", :host=>{"name"=>"dnsp12.servidoresdns.net", "certname"=>"dnsp12.servidoresdns.net"}}
2016-10-03 11:33:55 [sql] [D] Host::Managed Load (0.4ms) SELECT `hosts`.* FROM `hosts` WHERE `hosts`.`type` IN ('Host::Managed') AND `hosts`.`certname` = 'dnsp12.servidoresdns.net' LIMIT 1
2016-10-03 11:33:55 [app] [D] Merging facts for 'dnsp12.servidoresdns.net': deleted 0 facts
2016-10-03 11:33:55 [app] [D] Merging facts for 'dnsp12.servidoresdns.net': added 0 facts
2016-10-03 11:33:55 [app] [D] Merging facts for 'dnsp12.servidoresdns.net': updated 9 facts
2016-10-03 11:33:55 [app] [I] Import facts for 'dnsp12.servidoresdns.net' completed. Added: 0, Updated: 9, Deleted 0 facts
2016-10-03 11:33:56 [app] [D] We have following interfaces 'eth0, eth0_1, eth0_2, [..], eth0_790, sit0' based on facts
2016-10-03 11:33:56 [app] [D] Interface eth0 facts: {"netmask"=>"255.255.254.0", "network"=>"82.223.86.0", "mtu"=>"1500", "macaddress"=>"00:50:56:A6:39:14", "ipaddress"=>"82.223.87.46"}
[..]
2016-10-03 11:34:07 [app] [D] Interface eth0_753 facts: {"network"=>"217.76.140.0", "ipaddress"=>"217.76.141.223", "netmask"=>"255.255.254.0", "macaddress"=>"00:50:56:A6:39:14", "mtu"=>"1500"}
2016-10-03 11:34:07 [sql] [D] Saving eth0 NIC for host dnsp12.servidoresdns.net
2016-10-03 11:34:07 [sql] [D] Saving sit0 NIC for host dnsp12.servidoresdns.net
2016-10-03 11:34:07 [sql] [D] Saving eth0.1 NIC for host dnsp12.servidoresdns.net
[..]
2016-10-03 11:34:20 [sql] [D] Saving eth0.790 NIC for host dnsp12.servidoresdns.net
2016-10-03 11:34:21 [app] [I] Completed 201 Created in 26698ms (Views: 13.5ms | ActiveRecord: 7733.0ms)

Saving happens in many separate transactions:

2016-10-03 11:34:07 [sql] [D]   Nic::Base Load (0.5ms)  SELECT  `nics`.* FROM `nics` WHERE `nics`.`host_id` = 613 AND `nics`.`mac` = '00:50:56:a6:39:14' AND `nics`.`virtual` = 1 AND `nics`.    `identifier` = 'eth0.7'  ORDER BY `nics`.`identifier` ASC LIMIT 1
2016-10-03 11:34:07 [sql] [D] Saving eth0.7 NIC for host dnsp12.servidoresdns.net
2016-10-03 11:34:07 [sql] [D] (0.1ms) BEGIN
2016-10-03 11:34:07 [sql] [D] Nic::Base Load (17.3ms) SELECT `nics`.* FROM `nics` WHERE `nics`.`ip` = '82.223.91.251'
2016-10-03 11:34:07 [sql] [D] Using nic queue
2016-10-03 11:34:07 [sql] [D] (0.1ms) COMMIT

#4 Updated by Dominic Cleal about 1 year ago

  • Status changed from New to Assigned
  • Assigned To set to Dominic Cleal

#5 Updated by The Foreman Bot about 1 year ago

  • Status changed from Assigned to Ready For Testing
  • Pull request https://github.com/theforeman/foreman/pull/3909 added

#6 Updated by Dominic Cleal about 1 year ago

  • Status changed from Ready For Testing to Closed
  • % Done changed from 0 to 100

#7 Updated by Dominic Cleal about 1 year ago

  • Release set to 1.13.1

Also available in: Atom PDF