Bug #31200: Puma memory leak - Foreman

Actions

Copy link

Bug #31200

open

Puma memory leak

Added by David Goetschius over 4 years ago. Updated over 4 years ago.

Status:

Need more information

Priority:

Normal

Assignee:

Category:

Packaging

Target version:

Difficulty:

Triaged:

Bugzilla link:

Pull request:

Fixed in Releases:

Found in Releases:

Red Hat JIRA:

Description

Notice a memory leak in Katello 3.16.0; Foreman 2.1.2; tfm-rubygem-puma-4.3.3-4.el7.x86_64
Foreman server has 16CPUs and 64GB RAM
Foreman Tuning = Large
Then changed the following PUMA settings:
Environment=FOREMAN_PUMA_THREADS_MIN=8
Environment=FOREMAN_PUMA_THREADS_MAX=32
Environment=FOREMAN_PUMA_WORKERS=8

Files

MemoryLeak.png	View MemoryLeak.png	28.5 KB	Foreman server with % memory calculation	David Goetschius, 10/28/2020 07:47 PM
MemoryLeak2.png	View MemoryLeak2.png	226 KB	Foreman server top command	David Goetschius, 10/28/2020 07:57 PM
MemoryLeak3.PNG	View MemoryLeak3.PNG	35.4 KB		David Goetschius, 11/05/2020 06:42 PM

Related issues 1 (1 open — 0 closed)

Actions

Copy link

Updated by Lukas Zapletal over 4 years ago

Category changed from Compute resources to Packaging

I think closest category would be Performance or Packaging. Eric/Ewould should be able to comment on if this is a known issue.

Actions

Copy link

Updated by David Goetschius over 4 years ago

Few extra notes: On 8/25 we upgraded from 3.9 to 3.16. Puma was set to 2 workers; Min 0 and Max 8. We had tasks running long so we updated PUMA on 10/23 to 8 workers; min 8 and max 32. That is when we experienced and exploited the memory leak.

Actions

Copy link

Updated by Lukas Zapletal over 4 years ago

Status changed from New to Need more information

If you changed max workers to 32 on 10/24 than what you see is not memory leak. Our app requires 300-900 MB per once worker, it starts low (100MB) and then due to Linux copy-on-write mechanism of child processes, it starts to grow until you reach top.

For 32 workers you need at least 32 GB RAM + some extra RAM for other processes. I'd suggest 40GB if that's a VM.

Actions

Copy link

Updated by Lukas Zapletal over 4 years ago

Oh I see this instance has 64GB, so when you had 8 workers you saw 40% utilization, so that is 25GB (= 3 GB per worker). How many plugins do you have? We know that clean Foreman instance takes 300MB and grows up to 500 for core, 900 with Katello plugin and so on and so on.

In your case, to hold 32 workers 3GB each you will need 96GB RAM. I suggest you lower the number of workers.

I don't believe this is a leak per se, however we obviously do have some memory hogs in the codebase. I'd say many. To fix that, file more concrete report.

Actions

Copy link

Updated by Lukas Zapletal over 4 years ago

I see in the top output up to 9GB memory RSS consuption per worker. That's too much, yeah. Are you able to isolate what operation makes it grow? To identify code paths which needs investigation.

Actions

Copy link

Updated by David Goetschius over 4 years ago

File MemoryLeak3.PNG MemoryLeak3.PNG added

The Foreman add-ons and versions are as follows:

Name Version
foreman-tasks 2.0.2
foreman_ansible 5.1.1
foreman_bootdisk 17.0.2
foreman_column_view 0.4.0
foreman_discovery 16.1.0
foreman_docker 5.0.0
foreman_hooks 0.3.16
foreman_openscap 4.0.2
foreman_remote_execution 3.3.5
foreman_setup 7.0.0
foreman_templates 9.0.1
katello Katello 3.16.0

What do you recommend on how to isolate what operations makes it grow? Please advise

Here is some additional log information if it helps...

Actions

Copy link

Updated by Lukas Zapletal over 4 years ago

That was useful, most of the time spent are facts, which is expected. We plan on improving how we store them, not all of them need to be stored in normal form.

However what I see as more suspicious are reports generation, there were four requests on report_template generate which in total took 30% of all time spent. Can you make sure nobody clicks/starts on report generation, restart the workers and investigate if memory does not spike that hard? Reports are new feature, there could be a memory hog.

To make less stress on the db, I suggest you define some facts to be filtered out. We currently store all of them, if you have some hypervisors or container hosts with many interfaces, disks etc theye can heavily hit our fact tables - try to filter these facts completely (Administer - Settings - Facts exclude).

Actions

Copy link