Project

General

Profile

Actions

Bug #18386

closed

goferd conumes 100% cpu after losing the connection to qdrouterd

Added by Mario K almost 8 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
High
Assignee:
-
Category:
-
Target version:
Difficulty:
Triaged:
Fixed in Releases:
Found in Releases:

Description

Hi!

After rebooting the CentOS/RHEL Updateserver, foreman with scenario katello, we had a few clients where the goferd Process starts using 100% cpu after losing the connection to the qdrouterd running on the CentOS/RHEL Updateserver. The quick fix was to restart the goferd on each affected client machine.

Please fix that issue! thanks!

sincerely yours

Mario

Actions #1

Updated by Mario K almost 8 years ago

Hi

Here are some version details of our setup:

On the CentOS/RHEL Update Server we are using CentOS 7.3 with the packages:
foreman-1.13.3-1.el7.noarch
katello-3.2.2-1.el7.noarch

Around 50 percent of our CentOS6/RHEL6 Systems were affected from the problem.

The CentOS6 Systems have the versions 6.7 and 6.8 with the package:
gofer-2.7.6-1.el6.noarch

The RHEL6 Systems have the versions 6.6 and 6.8 with the package:
gofer-2.7.6-1.el6.noarch

Our CentOS7/RHEL7 Systems where not affected.

sincerely yours

Mario

Actions #2

Updated by Mario K almost 8 years ago

Hi

Today I analyzed the messages log file of one of the affected systems (RHEL 6.8).

I did replace the real hostname with hostname and the real fqdn of our internal CentOS/RHEL Updateserver with a fake fqdn. Also I did hide the IDs.

Here is the output:
Feb 2 13:47:50 hostname goferd: [INFO][pulp.agent.id-number-hidden] root:525 - Disconnected
Feb 2 13:47:50 hostname goferd: [ERROR][pulp.agent.id-number-hidden] gofer.messaging.adapter.proton.reliability:53 - Connection amqps://centosrhelupdateserver.domain.tld:5647 disconnected
Feb 2 13:48:00 hostname goferd: [INFO][pulp.agent.id-number-hidden] gofer.messaging.adapter.connect:28 - connecting: proton+amqps://centosrhelupdateserver.domain.tld:5647
Feb 2 13:48:01 hostname goferd: [INFO][pulp.agent.id-number-hidden] gofer.messaging.adapter.proton.connection:87 - open: URL: amqps://centosrhelupdateserver.domain.tld:5647|SSL: ca: /etc/rhsm/ca/katello-default-ca.pem|key: None|certificate: /etc/pki/consumer/bundle.pem|host-validation: None
Feb 2 13:48:01 hostname goferd: [INFO][pulp.agent.id-number-hidden] root:485 - connecting to centosrhelupdateserver.domain.tld:5647...
Feb 2 13:49:04 hostname goferd: [INFO][pulp.agent.id-number-hidden] root:525 - Disconnected
Feb 2 13:49:04 hostname goferd: [ERROR][pulp.agent.id-number-hidden] gofer.messaging.adapter.connect:33 - connect: proton+amqps://centosrhelupdateserver.domain.tld:5647, failed: Connection amqps://centosrhelupdateserver.domain.tld:5647 disconnected
Feb 2 13:49:04 hostname goferd: [INFO][pulp.agent.id-number-hidden] gofer.messaging.adapter.connect:35 - retry in 10 seconds
Feb 2 13:49:14 hostname goferd: [INFO][pulp.agent.id-number-hidden] gofer.messaging.adapter.connect:28 - connecting: proton+amqps://centosrhelupdateserver.domain.tld:5647
Feb 2 13:49:14 hostname goferd: [INFO][pulp.agent.id-number-hidden] gofer.messaging.adapter.proton.connection:87 - open: URL: amqps://centosrhelupdateserver.domain.tld:5647|SSL: ca: /etc/rhsm/ca/katello-default-ca.pem|key: None|certificate: /etc/pki/consumer/bundle.pem|host-validation: None
Feb 2 13:49:14 hostname goferd: [INFO][pulp.agent.id-number-hidden] root:485 - connecting to centosrhelupdateserver.domain.tld:5647...
Feb 2 13:50:17 hostname goferd: [INFO][pulp.agent.id-number-hidden] root:525 - Disconnected
Feb 2 13:50:17 hostname goferd: [ERROR][pulp.agent.id-number-hidden] gofer.messaging.adapter.connect:33 - connect: proton+amqps://centosrhelupdateserver.domain.tld:5647, failed: Connection amqps://centosrhelupdateserver.domain.tld:5647 disconnected
Feb 2 13:50:17 hostname goferd: [INFO][pulp.agent.id-number-hidden] gofer.messaging.adapter.connect:35 - retry in 12 seconds
Feb 2 13:50:29 hostname goferd: [INFO][pulp.agent.id-number-hidden] gofer.messaging.adapter.connect:28 - connecting: proton+amqps://centosrhelupdateserver.domain.tld:5647
Feb 2 13:50:29 hostname goferd: [INFO][pulp.agent.id-number-hidden] gofer.messaging.adapter.proton.connection:87 - open: URL: amqps://centosrhelupdateserver.domain.tld:5647|SSL: ca: /etc/rhsm/ca/katello-default-ca.pem|key: None|certificate: /etc/pki/consumer/bundle.pem|host-validation: None
Feb 2 13:50:29 hostname goferd: [INFO][pulp.agent.id-number-hidden] root:485 - connecting to centosrhelupdateserver.domain.tld:5647...
Feb 2 13:51:32 hostname goferd: [INFO][pulp.agent.id-number-hidden] root:525 - Disconnected
Feb 2 13:51:32 hostname goferd: [ERROR][pulp.agent.id-number-hidden] gofer.messaging.adapter.connect:33 - connect: proton+amqps://centosrhelupdateserver.domain.tld:5647, failed: Connection amqps://centosrhelupdateserver.domain.tld:5647 disconnected
Feb 2 13:51:32 hostname goferd: [INFO][pulp.agent.id-number-hidden] gofer.messaging.adapter.connect:35 - retry in 14 seconds
Feb 2 13:51:47 hostname goferd: [INFO][pulp.agent.id-number-hidden] gofer.messaging.adapter.connect:28 - connecting: proton+amqps://centosrhelupdateserver.domain.tld:5647
Feb 2 13:51:47 hostname goferd: [INFO][pulp.agent.id-number-hidden] gofer.messaging.adapter.proton.connection:87 - open: URL: amqps://centosrhelupdateserver.domain.tld:5647|SSL: ca: /etc/rhsm/ca/katello-default-ca.pem|key: None|certificate: /etc/pki/consumer/bundle.pem|host-validation: None
Feb 2 13:51:47 hostname goferd: [INFO][pulp.agent.id-number-hidden] root:485 - connecting to centosrhelupdateserver.domain.tld:5647...
Feb 2 13:52:50 hostname goferd: [INFO][pulp.agent.id-number-hidden] root:525 - Disconnected
Feb 2 13:52:50 hostname goferd: [ERROR][pulp.agent.id-number-hidden] gofer.messaging.adapter.connect:33 - connect: proton+amqps://centosrhelupdateserver.domain.tld:5647, failed: Connection amqps://centosrhelupdateserver.domain.tld:5647 disconnected
Feb 2 13:52:50 hostname goferd: [INFO][pulp.agent.id-number-hidden] gofer.messaging.adapter.connect:35 - retry in 17 seconds
Feb 2 13:53:07 hostname goferd: [INFO][pulp.agent.id-number-hidden] gofer.messaging.adapter.connect:28 - connecting: proton+amqps://centosrhelupdateserver.domain.tld:5647
Feb 2 13:53:07 hostname goferd: [INFO][pulp.agent.id-number-hidden] gofer.messaging.adapter.proton.connection:87 - open: URL: amqps://centosrhelupdateserver.domain.tld:5647|SSL: ca: /etc/rhsm/ca/katello-default-ca.pem|key: None|certificate: /etc/pki/consumer/bundle.pem|host-validation: None
Feb 2 13:53:07 hostname goferd: [INFO][pulp.agent.id-number-hidden] root:485 - connecting to centosrhelupdateserver.domain.tld:5647...
Feb 2 13:53:14 hostname goferd: [INFO][pulp.agent.id-number-hidden] gofer.messaging.adapter.proton.connection:92 - opened: proton+amqps://centosrhelupdateserver.domain.tld:5647
Feb 2 13:53:14 hostname goferd: [INFO][pulp.agent.id-number-hidden] gofer.messaging.adapter.connect:30 - connected: proton+amqps://centosrhelupdateserver.domain.tld:5647
Feb 2 13:53:14 hostname goferd: [INFO][pulp.agent.id-number-hidden] root:505 - connected to centosrhelupdateserver.domain.tld:5647
Feb 2 13:53:14 hostname goferd: [ERROR][pulp.agent.id-number-hidden] gofer.messaging.adapter.proton.reliability:47 - receiver id-number-hidden from pulp.agent.id-number-hidden closed due to: Condition('qd:no-route-to-dest', 'No route to the destination node')

sincerely yours

Mario

Actions #3

Updated by Luca Lorenzetto almost 8 years ago

Hello,

i'm experimenting same issue with this setup:

Red Hat Enterprise Linux Server release 7.2 (Maipo)
katello-agent-2.5.0-3.el7.noarch

After latest restart, hosts affected are 54 in more than 1100 hosts.+

I'm now proceeding with the upgrade of katello-agent, but latest version available is the one specified by Mario.

Actions #4

Updated by Mario K almost 8 years ago

Hi

In your IRC support channel I got the hint that it could be related to qpid.

So here are the qpid Version of the affected systems:

CentOS6:
qpid-proton-c-0.9-7.el6.x86_64
qpid-proton-c-0.13.1-1.el6.x86_64

RHEL6:
qpid-proton-c-0.9-7.el6.x86_64

sincerely yours

Mario

Actions #5

Updated by Justin Sherrill almost 8 years ago

we think this was caused by https://issues.apache.org/jira/browse/PROTON-1090

which seems to have been backported to qpid-proton at least 0.9-12 (meaning 0.9-7 is affected).

You should be able to find a newer version at the qpid copr repo here: https://copr.fedorainfracloud.org/coprs/g/qpid/qpid/ (looks 0.14 is available).

Actions #6

Updated by Mario K almost 8 years ago

  • Status changed from New to Resolved

Hi Justin Sherrill

Your hint to the qpid-proton package solved the issue.

We are using https://fedorapeople.org/groups/katello/releases/yum/3.2/client/el6/x86_64/ for our CentOS6/RHEL6 systems. There is the qpid-proton-c package in the version 0.13.1-1. In January we did manually update the katello agent with yum and the pid-proton-c did not update at that time. Yesterday I did update the pid-proton-c package with yum.

After that I did reboot the CentOS/RHEL Updateserver a few times. There were no clients with goferd using 100% CPU after the reboots.

Thanks for the support!

sincerely yours

Mario

Actions #7

Updated by Eric Helms almost 8 years ago

  • Status changed from Resolved to Closed
  • Translation missing: en.field_release set to 166
Actions

Also available in: Atom PDF