Bug #16488

ruby consumes 43GB RSS when there is lots of stucked errata apply tasks

Added by Ivan Necas about 1 year ago. Updated 10 months ago.

Status:Closed
Priority:Normal
Assigned To:-
Category:-
Target version:Foreman - Team Ivan Iteration 6
Difficulty: Bugzilla link:1368103
Found in release: Pull request:
Story points-
Velocity based estimate-

Description

Cloned from https://bugzilla.redhat.com/show_bug.cgi?id=1368103
I'm doing some ruby mem experiments and sometime around 07:59:07 (+-10s) ruby went from:

3290628 ruby

to:

2149180 ruby

this is in the production.log:

[...]
2016-08-18 07:57:28 [app] [I] Completed 200 OK in 998ms (Views: 257.5ms | ActiveRecord: 730.5ms)
2016-08-18 07:59:03 [foreman-tasks/dynflow] [I] start terminating throttle_limiter...
2016-08-18 07:59:03 [foreman-tasks/dynflow] [I] start terminating throttle_limiter...
2016-08-18 07:59:03 [foreman-tasks/dynflow] [I] start terminating client dispatcher...
2016-08-18 07:59:03 [foreman-tasks/dynflow] [I] stop listening for new events...
2016-08-18 07:59:03 [foreman-tasks/dynflow] [I] start terminating client dispatcher...
2016-08-18 07:59:03 [foreman-tasks/dynflow] [I] stop listening for new events...
2016-08-18 07:59:03 [foreman-tasks/dynflow] [I] start terminating clock...
2016-08-18 07:59:03 [foreman-tasks/dynflow] [I] start terminating clock...
2016-08-18 07:59:43 [app] [I] Started GET "/foreman_tasks/tasks/3cee987c-7266-4532-a222-bac6b8355029/sub_tasks" for 213.211.43.129 at 2016-08-18 07:59:43 -0400
[...]

Any idea on what it might be?

I use this crazy stuff to "measure" ruby RSS usage (well, not only ruby,
but it is in top 10):

ps --no-headers -eo rss,comm>a; for comm in $( sed 's/^\s*[0-9]\+\s*\(.*\)$/\1/' a | sort -u ); do size=$( grep "\s$comm" a | sed 's/^\s*\([0-9]\+\)\s*.*$/\1/' | paste -sd+ - | bc ); echo "$size $comm"; done | sort -n | tail

and here is what I have done:

1. I have clients registered through capsules with gofferd/katello-agent running
2. I have shut down capsules, so clients can not reach satellite
3. I have scheduled errata update on 400 clients
4. I was sitting and looking at the ruby mem usage

Around that 07:59:07, I have not touched satellite at all.

I'm looking into this, because Pradeep noticed at some point ruby was
consuming 43GB RSS and we believe this is connected with some stuck
tasks (had some problems with qpidd/qdrouterd/clients itself - so our
10k errata update task was a bit mess at first two tries).


Related issues

Related to foreman-tasks - Feature #17175: max_memory_per_executor support Closed 11/01/2016

History

#1 Updated by Ivan Necas about 1 year ago

  • Subject changed from ruby consumes 43GB RSS when there is lots of stucked errata apply tasks to ruby consumes 43GB RSS when there is lots of stucked errata apply tasks
  • Target version set to Team Ivan Iteration 3

#2 Updated by Ivan Necas 12 months ago

  • Target version changed from Team Ivan Iteration 3 to Team Ivan Iteration 4

#3 Updated by Ivan Necas 11 months ago

  • Target version changed from Team Ivan Iteration 4 to Team Ivan Iteration 5

#4 Updated by Ivan Necas 11 months ago

#5 Updated by Ivan Necas 11 months ago

  • Blocked by deleted (Feature #17175: max_memory_per_executor support)

#6 Updated by Ivan Necas 11 months ago

#7 Updated by Ivan Necas 11 months ago

  • Target version changed from Team Ivan Iteration 5 to Team Marek Iteration 5

So far, we were not able to find a leak in the dynflow/tasks code itself. One variant we are working with is the execution plan's being too big, but we can't find more details until we have more info, especially details of the tasks being run: I've asked about it in https://bugzilla.redhat.com/show_bug.cgi?id=1368103#c6.

What we can do to prevent unexpected grows in the future would be http://projects.theforeman.org/issues/17175.

#8 Updated by Ivan Necas 11 months ago

  • Target version changed from Team Marek Iteration 5 to Team Ivan Iteration 5

#9 Updated by Ivan Necas 11 months ago

  • Target version changed from Team Ivan Iteration 5 to Team Ivan Iteration 6

#10 Updated by Ivan Necas 10 months ago

  • Status changed from New to Closed

Closing due to insufficient data for debugging.

Also available in: Atom PDF