Project

General

Profile

Bug #23995

processing virt-who report blocks RHSM certs checks what can lead to 503 errors

Added by Shimon Shtein about 2 years ago. Updated almost 2 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
Subscriptions
Target version:
Difficulty:
Triaged:
Yes
Bugzilla link:
Fixed in Releases:
Found in Releases:

Description

Cloned from https://bugzilla.redhat.com/show_bug.cgi?id=1586210

Description of problem:
Processing a virt-who report causes one specific RHSM request type is blocked for some time. Since these requests are fired frequently, this can cause the requests occupy whole passenger queue and passenger starts to return 503.

Once the virt-who report processing is completed, the requests RHSM requests are unblocked. Anyway the 503 errors shouldnt happen meantime.

Version-Release number of selected component (if applicable):
Sat 6.3.1

How reproducible:
100% on customer data; generic reproducer shall not be hard to develop

Steps to Reproduce:
(generic reproducer)
1. Have few thousands of systems registered, with default certCheckInterval = 240 in rhsm.conf (the lower the better for reproducer)
2. Send virt-who report with mapping of several hundreds of systems
3. During processing of the report, check WebUI status or httpd error logs

Particular reproducer without a need of having a single host that is mimicked by specific curl requests:

A) to mimic RHSM certs check request: in fact just one particular URI GET request is essential / sufficient:

curl -s -u admin:changeme -X GET https://$(hostname -f)/rhsm/consumers/${uuid}/certificates/serials

(set uuid to various UUIDs of hosts / candlepin consumer IDs, and run these requests concurrently several times)

B) to mimic virt-who report: have virt-who-report.json with HV<->VMs mappings, and run:

time curl -s -u admin:changeme -X POST -H "Content-Type: application/json" -d @virt-who-report.json 'https://your.satellite/rhsm/hypervisors?owner=Owner&env=Library'

Actual results:
3. shows 503 errors in WebUI, /var/log/httpd/error_log having "Request queue is full. Returning an error" errors.

Expected results:
3. WebUI accessible, no such errors in httpd logs.

Additional info:
Technical explanation what goes wrong (to some extent):
- virt-who report processing requires updating katello_subscription_facets postgres table in some lengthy transaction (*)
- So Katello::Api::Rhsm::CandlepinProxiesController#serials requests are stuck on step:
@host.subscription_facet.save!
for tens(!) of seconds, till the virt-who report is finished
- these requests come from the RHSM certs check queries / particular URI request /rhsm/consumers/${uuid}/certificates/serials
- these requests get accumulated for the few tens of seconds, and for higher load of them, this can fill whole passenger request queue
- that consequently triggers the 503 errors

Particular reproducer on customer data to be provided in next comment.

Associated revisions

Revision e0d2c879 (diff)
Added by Shimon Shtein almost 2 years ago

Fixes #23995 - Updated hypervisors_update to bulk actions

History

#1 Updated by Jonathon Turel about 2 years ago

  • Legacy Backlogs Release (now unused) set to 338

#2 Updated by The Foreman Bot about 2 years ago

  • Assignee set to Shimon Shtein
  • Status changed from New to Ready For Testing
  • Pull request https://github.com/Katello/katello/pull/7472 added

#3 Updated by Justin Sherrill about 2 years ago

  • Triaged set to No
  • Target version changed from Katello 3.7.0 to Katello 3.7.1
  • Priority changed from Urgent to High

#4 Updated by Shimon Shtein almost 2 years ago

  • Status changed from Ready For Testing to Closed

#5 Updated by Jonathon Turel almost 2 years ago

  • Target version changed from Katello 3.7.1 to Katello 3.9.0

Moving back to 3.9 due to need of correct upstream candlepin build

Also available in: Atom PDF