Project

General

Profile

Actions

Bug #10148

closed

Repeated crashes of goferd on Pulp Node Capsule when trying to SyncNode

Added by Michael Bassler about 9 years ago. Updated almost 6 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Foreman Proxy Content
Target version:
Difficulty:
Triaged:
Yes
Fixed in Releases:
Found in Releases:

Description

Running latest Katello 2.2.0-3 with Foreman 1.8.0-0.1.RC3

During 'Actions::Pulp::Consumer::SyncNode' I repeatedly see goferd crashing. Attached are the /var/log/messages from the master and capsule node systems as well as the abrtd mail from the goferd crash.

Please let me know if there is any additional information I can provide.

Thank You
-Michael


Files

capsule_abrtd_mail capsule_abrtd_mail 76.5 KB Capsule abrtd mail Michael Bassler, 04/15/2015 11:45 AM
capsule_var-log-messages capsule_var-log-messages 1.48 KB Capsule Messages Michael Bassler, 04/15/2015 11:45 AM
master_var-log-messages master_var-log-messages 10.7 KB Master Server Messages Michael Bassler, 04/15/2015 11:46 AM
master_abrtd_mail master_abrtd_mail 49.6 KB Master abrt mail Michael Bassler, 04/16/2015 12:42 PM
Actions #1

Updated by Eric Helms about 9 years ago

  • Status changed from New to Need more information
  • Triaged changed from No to Yes

Howdy,

Based on the logs, you are seeing qpid connection issues on the master and the goferd crashes on the capsule? Do you ever see qpid connected or re-connect on the server itself? Is qpidd and qdrouterd both running?

Actions #2

Updated by Michael Bassler about 9 years ago

Eric Helms wrote:

Howdy,

Based on the logs, you are seeing qpid connection issues on the master and the goferd crashes on the capsule? Do you ever see qpid connected or re-connect on the server itself? Is qpidd and qdrouterd both running?

Master qpidd does periodically crash. It doesnt appear to be a one to one thou, i.e. if I watch goferd on capsule and restart it when it fails it is only every few times when goferd that there is a corresponding crash of qpidd on the master. However looking more at the master I notice there are a number of 'qpidd36760: 2015-04-12 03:45:18 [System] error Error reading socket: Success(0)' errors.

There are also failed attempts to abrtd qpidd, on the capsule I had to touch '/etc/abrt/abrt-action-save-package-data.conf' to change OpenGPGCheck to no. After making the same change to the master I was able to capture the attached abrt mail.

After starting goferd and qpidd there was 5 goferd crashes on capsule to 1 qpidd crash on master. qdrouter has remained up the whole time on both systems.

-Michael

Actions #3

Updated by Eric Helms about 9 years ago

Some manual testing and examining other systems of my own I am noticing slightly different behaviors between EL6 and EL7. What setup are you using across your main Katello and the capsule with respect to OS and version?

Actions #4

Updated by Michael Bassler almost 9 years ago

Eric Helms wrote:

Some manual testing and examining other systems of my own I am noticing slightly different behaviors between EL6 and EL7. What setup are you using across your main Katello and the capsule with respect to OS and version?

Both the main katello and capsule are running RHEL 6.6 (2.6.32-504.12.2.el6.x86_64)

Actions #5

Updated by Eric Helms almost 9 years ago

  • translation missing: en.field_release changed from 23 to 51
Actions #6

Updated by Eric Helms almost 9 years ago

  • translation missing: en.field_release changed from 51 to 55

With the 2.2 release we pushed out updated packages for qpid libraries, are you still seeing this as an issue with the 2.2 release?

Actions #7

Updated by Michael Bassler almost 9 years ago

Upgraded to 2.2.0-5.el6 things appear to be better, have not had crashes at any consistent rate. One instance of qdrouterd crashing on the capsule but it did not present the way and have not yet been able to reproduce it.

Thank You
-Michael

Actions #8

Updated by Eric Helms almost 9 years ago

  • Status changed from Need more information to Resolved
  • translation missing: en.field_release changed from 55 to 51

Thanks for the updated information Michael. I am going to this to resolved for now. If you encounter it again please re-open and let us know. Thanks!

Actions

Also available in: Atom PDF