Project

General

Profile

Bug #16170

Updated by Tomáš Strachota almost 3 years ago

Cloned from https://bugzilla.redhat.com/show_bug.cgi?id=1325879
+++ This bug was initially created as a clone of Bug #1320557 +++

Description of problem:
When unregistering a content host (either by subscription-manager unregister, or deleting Content Host, or deleting Host), Actions::Katello::System::Destroy task is being processed.

This task has a concurrency bug that with some probability causes Actions::Candlepin::ListenOnCandlepinEvents task to be paused with error:

Katello::Resources::Candlepin::Consumer: 410 Gone {"displayMessage":"Unit b3012a90-41b6-4788-98a9-5b41839b6dca has been deleted","requestUuid":"04dbfeda-75cb-4ffa-ad9f-9f4c818fc868","deletedId":"b3012a90-41b6-4788-98a9-5b41839b6dca"} (GET /candlepin/consumers/b3012a90-41b6-4788-98a9-5b41839b6dca)

(UUID matches the UUID of just being deleted Content Host)

sequence of steps leading to the bug:
- Actions::Candlepin::Consumer::Destroy sub-task executed
- it deletes the consumer from candlepin and recalculates compliance for it (imho redundant step when we delete the consumer, but katello is aware of it)
- it announces the compliance.create event to ListenOnCandlepinEvents task
- katello finds the katello_system is present so it runs "reindex_consumer" "re-indexing content host .."
- the parent task Actions::Katello::System::Destroy even _now_ enters Actions::Katello::System::Destroy
- this subtask deletes the system from katello_systems _after_ the check before "re-indexing content host .." is made, so the re-index is _not_ skipped
- the re-index of the content host calls GET consumer/<uuid> on candlepin, what triggers the 410 Gone in ListenOnCandlepinEvents task



Version-Release number of selected component (if applicable):
Sat 6.1.7

How reproducible:
100% within few minutes

Steps to Reproduce:
1. tail -f /var/log/foreman/production.log | grep -e "in phase Finalize Actions::Katello::System::Destroy" -e "skip re-indexing of non-existent content host" -e "re-indexing content host"

2. Have opened Actions::Candlepin::ListenOnCandlepinEvents task in WebUI

3. On some Content Host, register and unregister it in a loop (here RHEL7 used, update if using RHEL5 or 6 accordingly):

while true; do
subscription-manager register --force --org="Default_Organization" --environment="Library" --username=admin --password=faYakexMm5XN543x
subscription-manager subscribe --pool=8aa2d415526494380152732fc8d20dd7
subscription-manager repos --enable rhel-7-server-rpms --enable rhel-7-server-satellite-tools-6.1-rpms
date
subscription-manager unregister
date
sleep 5
done

4. monitor the tail -f output and ListenOnCandlepinEvents task

Actual results:
- tail shows:


2016-03-23 14:17:05 [D] re-indexing content host pmoravec-rhel7.gsslab.brq.redhat.com
2016-03-23 14:17:05 [D] Step f5705893-2577-41c4-9af6-9b7c10ccb646: 6 running >> success in phase Finalize Actions::Katello::System::Destroy

- ListenOnCandlepinEvents task is paused/error with error:

Katello::Resources::Candlepin::Consumer: 410 Gone {"displayMessage":"Unit b3012a90-41b6-4788-98a9-5b41839b6dca has been deleted","requestUuid":"04dbfeda-75cb-4ffa-ad9f-9f4c818fc868","deletedId":"b3012a90-41b6-4788-98a9-5b41839b6dca"} (GET /candlepin/consumers/b3012a90-41b6-4788-98a9-5b41839b6dca)

Expected results:
- tail can be ok (rather symptom for devels), just the ListenOnCandlepinEvents task needs to be running without an error

Additional info:
seems like a lack of locking / concurrency bug where reindex_pool_subscription_handler.rb:

def reindex_consumer(message)
if message.content['newEntity']
uuid = JSON.parse(message.content['newEntity'])['consumer']['uuid']
system = ::Katello::System.find_by_uuid(uuid)
if system.nil?
@logger.debug "skip re-indexing of non-existent content host #{uuid}"
else
@logger.debug "re-indexing content host #{system.name}"
system.update_index
end

needs to be executed as atomic operation (not concurrently with Actions::Katello::System::Destroy)

Back