Project

General

Profile

Bug #16170

Updated by Tomáš Strachota over 8 years ago

Cloned from https://bugzilla.redhat.com/show_bug.cgi?id=1325879  
  +++ This bug was initially created as a clone of Bug #1320557 +++ 

 Description of problem: 
 When unregistering a content host (either by subscription-manager unregister, or deleting Content Host, or deleting Host), Actions::Katello::System::Destroy task is being processed. 

 This task has a concurrency bug that with some probability causes Actions::Candlepin::ListenOnCandlepinEvents task to be paused with error: 

 Katello::Resources::Candlepin::Consumer: 410 Gone {"displayMessage":"Unit b3012a90-41b6-4788-98a9-5b41839b6dca has been deleted","requestUuid":"04dbfeda-75cb-4ffa-ad9f-9f4c818fc868","deletedId":"b3012a90-41b6-4788-98a9-5b41839b6dca"} (GET /candlepin/consumers/b3012a90-41b6-4788-98a9-5b41839b6dca) 

 (UUID matches the UUID of just being deleted Content Host) 

 sequence of steps leading to the bug: 
 - Actions::Candlepin::Consumer::Destroy sub-task executed 
   - it deletes the consumer from candlepin and recalculates compliance for it (imho redundant step when we delete the consumer, but katello is aware of it) 
   - it announces the compliance.create event to ListenOnCandlepinEvents task 
   - katello finds the katello_system is present so it runs "reindex_consumer" "re-indexing content host .." 
 - the parent task Actions::Katello::System::Destroy even _now_ enters Actions::Katello::System::Destroy 
   - this subtask deletes the system from katello_systems _after_ the check before "re-indexing content host .." is made, so the re-index is _not_ skipped 
 - the re-index of the content host calls GET consumer/<uuid> on candlepin, what triggers the 410 Gone in ListenOnCandlepinEvents task 



 


 Version-Release number of selected component (if applicable): 
 Sat 6.1.7 


 How reproducible: 
 100% within few minutes 


 Steps to Reproduce: 
 1. tail -f /var/log/foreman/production.log | grep -e "in phase Finalize Actions::Katello::System::Destroy" -e "skip re-indexing of non-existent content host" -e "re-indexing content host" 

 2. Have opened Actions::Candlepin::ListenOnCandlepinEvents task in WebUI 

 3. On some Content Host, register and unregister it in a loop (here RHEL7 used, update if using RHEL5 or 6 accordingly): 

 while true; do 
   subscription-manager register --force --org="Default_Organization" --environment="Library" --username=admin --password=faYakexMm5XN543x 
   subscription-manager subscribe --pool=8aa2d415526494380152732fc8d20dd7 
   subscription-manager repos --enable rhel-7-server-rpms --enable rhel-7-server-satellite-tools-6.1-rpms 
   date 
   subscription-manager unregister 
   date 
   sleep 5 
 done 

 4. monitor the tail -f output and ListenOnCandlepinEvents task 


 Actual results: 
 - tail shows: 

 2016-03-23 14:17:05 [D] re-indexing content host pmoravec-rhel7.gsslab.brq.redhat.com 
 2016-03-23 14:17:05 [D]            Step f5705893-2577-41c4-9af6-9b7c10ccb646: 6     running >>     success in phase Finalize Actions::Katello::System::Destroy 

 - ListenOnCandlepinEvents task is paused/error with error: 

 Katello::Resources::Candlepin::Consumer: 410 Gone {"displayMessage":"Unit b3012a90-41b6-4788-98a9-5b41839b6dca has been deleted","requestUuid":"04dbfeda-75cb-4ffa-ad9f-9f4c818fc868","deletedId":"b3012a90-41b6-4788-98a9-5b41839b6dca"} (GET /candlepin/consumers/b3012a90-41b6-4788-98a9-5b41839b6dca) 


 Expected results: 
 - tail can be ok (rather symptom for devels), just the ListenOnCandlepinEvents task needs to be running without an error 


 Additional info: 
 seems like a lack of locking / concurrency bug where reindex_pool_subscription_handler.rb: 

       def reindex_consumer(message) 
         if message.content['newEntity'] 
           uuid = JSON.parse(message.content['newEntity'])['consumer']['uuid'] 
           system = ::Katello::System.find_by_uuid(uuid) 
           if system.nil? 
             @logger.debug "skip re-indexing of non-existent content host #{uuid}" 
           else 
             @logger.debug "re-indexing content host #{system.name}" 
             system.update_index 
           end 

 needs to be executed as atomic operation (not concurrently with Actions::Katello::System::Destroy)

Back