Bug #10148
closedRepeated crashes of goferd on Pulp Node Capsule when trying to SyncNode
Description
Running latest Katello 2.2.0-3 with Foreman 1.8.0-0.1.RC3
During 'Actions::Pulp::Consumer::SyncNode' I repeatedly see goferd crashing. Attached are the /var/log/messages from the master and capsule node systems as well as the abrtd mail from the goferd crash.
Please let me know if there is any additional information I can provide.
Thank You
-Michael
Files
Updated by Eric Helms over 9 years ago
- Status changed from New to Need more information
- Triaged changed from No to Yes
Howdy,
Based on the logs, you are seeing qpid connection issues on the master and the goferd crashes on the capsule? Do you ever see qpid connected or re-connect on the server itself? Is qpidd and qdrouterd both running?
Updated by Michael Bassler over 9 years ago
- File master_abrtd_mail master_abrtd_mail added
Eric Helms wrote:
Howdy,
Based on the logs, you are seeing qpid connection issues on the master and the goferd crashes on the capsule? Do you ever see qpid connected or re-connect on the server itself? Is qpidd and qdrouterd both running?
Master qpidd does periodically crash. It doesnt appear to be a one to one thou, i.e. if I watch goferd on capsule and restart it when it fails it is only every few times when goferd that there is a corresponding crash of qpidd on the master. However looking more at the master I notice there are a number of 'qpidd36760: 2015-04-12 03:45:18 [System] error Error reading socket: Success(0)' errors.
There are also failed attempts to abrtd qpidd, on the capsule I had to touch '/etc/abrt/abrt-action-save-package-data.conf' to change OpenGPGCheck to no. After making the same change to the master I was able to capture the attached abrt mail.
After starting goferd and qpidd there was 5 goferd crashes on capsule to 1 qpidd crash on master. qdrouter has remained up the whole time on both systems.
-Michael
Updated by Eric Helms over 9 years ago
Some manual testing and examining other systems of my own I am noticing slightly different behaviors between EL6 and EL7. What setup are you using across your main Katello and the capsule with respect to OS and version?
Updated by Michael Bassler over 9 years ago
Eric Helms wrote:
Some manual testing and examining other systems of my own I am noticing slightly different behaviors between EL6 and EL7. What setup are you using across your main Katello and the capsule with respect to OS and version?
Both the main katello and capsule are running RHEL 6.6 (2.6.32-504.12.2.el6.x86_64)
Updated by Eric Helms over 9 years ago
- Translation missing: en.field_release changed from 23 to 51
Updated by Eric Helms over 9 years ago
- Translation missing: en.field_release changed from 51 to 55
With the 2.2 release we pushed out updated packages for qpid libraries, are you still seeing this as an issue with the 2.2 release?
Updated by Michael Bassler over 9 years ago
Upgraded to 2.2.0-5.el6 things appear to be better, have not had crashes at any consistent rate. One instance of qdrouterd crashing on the capsule but it did not present the way and have not yet been able to reproduce it.
Thank You
-Michael
Updated by Eric Helms over 9 years ago
- Status changed from Need more information to Resolved
- Translation missing: en.field_release changed from 55 to 51
Thanks for the updated information Michael. I am going to this to resolved for now. If you encounter it again please re-open and let us know. Thanks!