Project

General

Profile

Actions

Bug #29419

closed

On unhealthy Satellite, dynflow_envelopes table might grow indefinitely

Added by Adam Ruzicka about 4 years ago. Updated about 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Dynflow
Target version:
-
Difficulty:
Triaged:
No
Fixed in Releases:
Found in Releases:

Description

Cloned from https://bugzilla.redhat.com/show_bug.cgi?id=1789522

Description of problem:
The dynflow_envelopes table is used internally by dynflow for communication between worlds (which roughly map to processes or threads). Basically we use the table as a message bus. That means every row being added to the table should be deleted eventually. However, if recipient of an envelope is not present anymore the envelope will stay there forever.

Having lots of stale envelopes may have negative impact on Satellite's performance.

Version-Release number of selected component (if applicable):

How reproducible:
A bit tricky

Steps to Reproduce:
1. Trigger bunch of tasks
2. Check contents of dynflow_envelopes table
3. systemctl kill -s 9 dynflowd
4. (optional) Go to /foreman_tasks/dynflow > status, click load execution items count (this makes the client world send out some envelopes to executors)
5. Check contents of dynflow_envelopes table

Actual results:
A new record appears in dynflow_envelopes and is never removed from there.

Expected results:
Ideally the record should not appear at all. There are some edge cases where this cannot be achieved. If it does appear, then we should guarrantee that it will eventually be removed from there.

Additional info:
We should probably clean undeliverable envelopes when running task cleanup and during world invalidation. Also we should be able to reduce the overall number of envelopes being sent around.

Maybe adding a check to foreman-maintain and having the number of rows in that table collected by sosreport/foreman-debug would be good idea too.

To see how many undeliverable envelopes there are on the system, run the following query in psql

SELECT COUNT FROM dynflow_envelopes WHERE receiver_id NOT IN (SELECT id FROM dynflow_coordinator_records);

In general it should be safe to replace "SELECT COUNT" with "DELETE" to get things back in order.

Actions #1

Updated by Adam Ruzicka about 4 years ago

  • Subject changed from On unhealthy Satellite, dynflow_envelopes table might grow indefinitely to On unhealthy Satellite, dynflow_envelopes table might grow indefinitely
  • Category set to Dynflow
  • Status changed from New to Ready For Testing
  • Assignee set to Adam Ruzicka
  • Pull request https://github.com/Dynflow/dynflow/pull/351 added
Actions #2

Updated by Adam Ruzicka about 4 years ago

  • Status changed from Ready For Testing to Closed
Actions

Also available in: Atom PDF