Bug #18386
closedgoferd conumes 100% cpu after losing the connection to qdrouterd
Description
Hi!
After rebooting the CentOS/RHEL Updateserver, foreman with scenario katello, we had a few clients where the goferd Process starts using 100% cpu after losing the connection to the qdrouterd running on the CentOS/RHEL Updateserver. The quick fix was to restart the goferd on each affected client machine.
Please fix that issue! thanks!
sincerely yours
Mario
Updated by Mario K almost 8 years ago
Hi
Here are some version details of our setup:
On the CentOS/RHEL Update Server we are using CentOS 7.3 with the packages:
foreman-1.13.3-1.el7.noarch
katello-3.2.2-1.el7.noarch
Around 50 percent of our CentOS6/RHEL6 Systems were affected from the problem.
The CentOS6 Systems have the versions 6.7 and 6.8 with the package:
gofer-2.7.6-1.el6.noarch
The RHEL6 Systems have the versions 6.6 and 6.8 with the package:
gofer-2.7.6-1.el6.noarch
Our CentOS7/RHEL7 Systems where not affected.
sincerely yours
Mario
Updated by Mario K almost 8 years ago
Hi
Today I analyzed the messages log file of one of the affected systems (RHEL 6.8).
I did replace the real hostname with hostname and the real fqdn of our internal CentOS/RHEL Updateserver with a fake fqdn. Also I did hide the IDs.
Here is the output:
Feb 2 13:47:50 hostname goferd: [INFO][pulp.agent.id-number-hidden] root:525 - Disconnected
Feb 2 13:47:50 hostname goferd: [ERROR][pulp.agent.id-number-hidden] gofer.messaging.adapter.proton.reliability:53 - Connection amqps://centosrhelupdateserver.domain.tld:5647 disconnected
Feb 2 13:48:00 hostname goferd: [INFO][pulp.agent.id-number-hidden] gofer.messaging.adapter.connect:28 - connecting: proton+amqps://centosrhelupdateserver.domain.tld:5647
Feb 2 13:48:01 hostname goferd: [INFO][pulp.agent.id-number-hidden] gofer.messaging.adapter.proton.connection:87 - open: URL: amqps://centosrhelupdateserver.domain.tld:5647|SSL: ca: /etc/rhsm/ca/katello-default-ca.pem|key: None|certificate: /etc/pki/consumer/bundle.pem|host-validation: None
Feb 2 13:48:01 hostname goferd: [INFO][pulp.agent.id-number-hidden] root:485 - connecting to centosrhelupdateserver.domain.tld:5647...
Feb 2 13:49:04 hostname goferd: [INFO][pulp.agent.id-number-hidden] root:525 - Disconnected
Feb 2 13:49:04 hostname goferd: [ERROR][pulp.agent.id-number-hidden] gofer.messaging.adapter.connect:33 - connect: proton+amqps://centosrhelupdateserver.domain.tld:5647, failed: Connection amqps://centosrhelupdateserver.domain.tld:5647 disconnected
Feb 2 13:49:04 hostname goferd: [INFO][pulp.agent.id-number-hidden] gofer.messaging.adapter.connect:35 - retry in 10 seconds
Feb 2 13:49:14 hostname goferd: [INFO][pulp.agent.id-number-hidden] gofer.messaging.adapter.connect:28 - connecting: proton+amqps://centosrhelupdateserver.domain.tld:5647
Feb 2 13:49:14 hostname goferd: [INFO][pulp.agent.id-number-hidden] gofer.messaging.adapter.proton.connection:87 - open: URL: amqps://centosrhelupdateserver.domain.tld:5647|SSL: ca: /etc/rhsm/ca/katello-default-ca.pem|key: None|certificate: /etc/pki/consumer/bundle.pem|host-validation: None
Feb 2 13:49:14 hostname goferd: [INFO][pulp.agent.id-number-hidden] root:485 - connecting to centosrhelupdateserver.domain.tld:5647...
Feb 2 13:50:17 hostname goferd: [INFO][pulp.agent.id-number-hidden] root:525 - Disconnected
Feb 2 13:50:17 hostname goferd: [ERROR][pulp.agent.id-number-hidden] gofer.messaging.adapter.connect:33 - connect: proton+amqps://centosrhelupdateserver.domain.tld:5647, failed: Connection amqps://centosrhelupdateserver.domain.tld:5647 disconnected
Feb 2 13:50:17 hostname goferd: [INFO][pulp.agent.id-number-hidden] gofer.messaging.adapter.connect:35 - retry in 12 seconds
Feb 2 13:50:29 hostname goferd: [INFO][pulp.agent.id-number-hidden] gofer.messaging.adapter.connect:28 - connecting: proton+amqps://centosrhelupdateserver.domain.tld:5647
Feb 2 13:50:29 hostname goferd: [INFO][pulp.agent.id-number-hidden] gofer.messaging.adapter.proton.connection:87 - open: URL: amqps://centosrhelupdateserver.domain.tld:5647|SSL: ca: /etc/rhsm/ca/katello-default-ca.pem|key: None|certificate: /etc/pki/consumer/bundle.pem|host-validation: None
Feb 2 13:50:29 hostname goferd: [INFO][pulp.agent.id-number-hidden] root:485 - connecting to centosrhelupdateserver.domain.tld:5647...
Feb 2 13:51:32 hostname goferd: [INFO][pulp.agent.id-number-hidden] root:525 - Disconnected
Feb 2 13:51:32 hostname goferd: [ERROR][pulp.agent.id-number-hidden] gofer.messaging.adapter.connect:33 - connect: proton+amqps://centosrhelupdateserver.domain.tld:5647, failed: Connection amqps://centosrhelupdateserver.domain.tld:5647 disconnected
Feb 2 13:51:32 hostname goferd: [INFO][pulp.agent.id-number-hidden] gofer.messaging.adapter.connect:35 - retry in 14 seconds
Feb 2 13:51:47 hostname goferd: [INFO][pulp.agent.id-number-hidden] gofer.messaging.adapter.connect:28 - connecting: proton+amqps://centosrhelupdateserver.domain.tld:5647
Feb 2 13:51:47 hostname goferd: [INFO][pulp.agent.id-number-hidden] gofer.messaging.adapter.proton.connection:87 - open: URL: amqps://centosrhelupdateserver.domain.tld:5647|SSL: ca: /etc/rhsm/ca/katello-default-ca.pem|key: None|certificate: /etc/pki/consumer/bundle.pem|host-validation: None
Feb 2 13:51:47 hostname goferd: [INFO][pulp.agent.id-number-hidden] root:485 - connecting to centosrhelupdateserver.domain.tld:5647...
Feb 2 13:52:50 hostname goferd: [INFO][pulp.agent.id-number-hidden] root:525 - Disconnected
Feb 2 13:52:50 hostname goferd: [ERROR][pulp.agent.id-number-hidden] gofer.messaging.adapter.connect:33 - connect: proton+amqps://centosrhelupdateserver.domain.tld:5647, failed: Connection amqps://centosrhelupdateserver.domain.tld:5647 disconnected
Feb 2 13:52:50 hostname goferd: [INFO][pulp.agent.id-number-hidden] gofer.messaging.adapter.connect:35 - retry in 17 seconds
Feb 2 13:53:07 hostname goferd: [INFO][pulp.agent.id-number-hidden] gofer.messaging.adapter.connect:28 - connecting: proton+amqps://centosrhelupdateserver.domain.tld:5647
Feb 2 13:53:07 hostname goferd: [INFO][pulp.agent.id-number-hidden] gofer.messaging.adapter.proton.connection:87 - open: URL: amqps://centosrhelupdateserver.domain.tld:5647|SSL: ca: /etc/rhsm/ca/katello-default-ca.pem|key: None|certificate: /etc/pki/consumer/bundle.pem|host-validation: None
Feb 2 13:53:07 hostname goferd: [INFO][pulp.agent.id-number-hidden] root:485 - connecting to centosrhelupdateserver.domain.tld:5647...
Feb 2 13:53:14 hostname goferd: [INFO][pulp.agent.id-number-hidden] gofer.messaging.adapter.proton.connection:92 - opened: proton+amqps://centosrhelupdateserver.domain.tld:5647
Feb 2 13:53:14 hostname goferd: [INFO][pulp.agent.id-number-hidden] gofer.messaging.adapter.connect:30 - connected: proton+amqps://centosrhelupdateserver.domain.tld:5647
Feb 2 13:53:14 hostname goferd: [INFO][pulp.agent.id-number-hidden] root:505 - connected to centosrhelupdateserver.domain.tld:5647
Feb 2 13:53:14 hostname goferd: [ERROR][pulp.agent.id-number-hidden] gofer.messaging.adapter.proton.reliability:47 - receiver id-number-hidden from pulp.agent.id-number-hidden closed due to: Condition('qd:no-route-to-dest', 'No route to the destination node')
sincerely yours
Mario
Updated by Luca Lorenzetto almost 8 years ago
Hello,
i'm experimenting same issue with this setup:
Red Hat Enterprise Linux Server release 7.2 (Maipo)
katello-agent-2.5.0-3.el7.noarch
After latest restart, hosts affected are 54 in more than 1100 hosts.+
I'm now proceeding with the upgrade of katello-agent, but latest version available is the one specified by Mario.
Updated by Mario K almost 8 years ago
Hi
In your IRC support channel I got the hint that it could be related to qpid.
So here are the qpid Version of the affected systems:
CentOS6:
qpid-proton-c-0.9-7.el6.x86_64
qpid-proton-c-0.13.1-1.el6.x86_64
RHEL6:
qpid-proton-c-0.9-7.el6.x86_64
sincerely yours
Mario
Updated by Justin Sherrill almost 8 years ago
we think this was caused by https://issues.apache.org/jira/browse/PROTON-1090
which seems to have been backported to qpid-proton at least 0.9-12 (meaning 0.9-7 is affected).
You should be able to find a newer version at the qpid copr repo here: https://copr.fedorainfracloud.org/coprs/g/qpid/qpid/ (looks 0.14 is available).
Updated by Mario K almost 8 years ago
- Status changed from New to Resolved
Hi Justin Sherrill
Your hint to the qpid-proton package solved the issue.
We are using https://fedorapeople.org/groups/katello/releases/yum/3.2/client/el6/x86_64/ for our CentOS6/RHEL6 systems. There is the qpid-proton-c package in the version 0.13.1-1. In January we did manually update the katello agent with yum and the pid-proton-c did not update at that time. Yesterday I did update the pid-proton-c package with yum.
After that I did reboot the CentOS/RHEL Updateserver a few times. There were no clients with goferd using 100% CPU after the reboots.
Thanks for the support!
sincerely yours
Mario
Updated by Eric Helms almost 8 years ago
- Status changed from Resolved to Closed
- Translation missing: en.field_release set to 166