Bug #4114: Trends don't scale well and become very slow - Foreman

Actions

Copy link

Bug #4114

closed

Trends don't scale well and become very slow

Added by Ewoud Kohl van Wijngaarden over 11 years ago. Updated almost 10 years ago.

Status:

Duplicate

Priority:

Normal

Assignee:

Category:

Trends

Target version:

Difficulty:

Triaged:

Bugzilla link:

Pull request:

Fixed in Releases:

Found in Releases:

Red Hat JIRA:

Description

Loading trends for kernel versions (45 different versions, 152 hosts) for the past 3 days takes 97,45 seconds.

Unrelated to my environment here are some debug logs: https://gist.github.com/xorpaul/8456338.

Related issues 2 (0 open — 2 closed)

Actions

Copy link

Updated by Robert Birnie over 11 years ago

A lot of this is based on the number of data points as there is a loop that loops over each one to format it properly. If performance is slowing down best option is to change how often you are collecting trends data from the cronjob. Default is 30minutes, change it to hourly or every two hours would halve the time it takes to load.

If you have sampling every half hour, with 45 versions, its looping over an array of 6480 items: 3 * 24 * 2 * 45

Actions

Copy link

Updated by Dominic Cleal about 11 years ago

Related to Bug #5568: Possible memory leak in trends added

Actions

Copy link

Updated by Glen Ogilvie almost 11 years ago

We had this break badly for us. We monitored the trend uptime.
Here are some details of what we saw.

select count(*) from trend_counters;
count
----------
24613273

The database was 3GB in size.

Trying to view the trends resulted in 100% CPU load, with ruby consuming lots of memory and throwing poor performing queries at the database.

Saw lots of:
2014-08-14 23:06:31.209 NZST foreman 2014-08-14 23:00:10 NZST foreman LOG: duration: 3640.003 ms statement: SELECT 1 AS one FROM "trend_counters" WHERE ("trend_counters"."created_at" = '2014-08-14 11:00:14.286739' AND "trend_counters"."trend_id" = 1145) LIMIT 1
2014-08-14 23:06:33.401 NZST 2014-08-14 19:33:36 NZST LOG: checkpoints are occurring too frequently (7 seconds apart)
2014-08-14 23:06:33.401 NZST 2014-08-14 19:33:36 NZST HINT: Consider increasing the configuration parameter "checkpoint_segments".
2014-08-14 23:06:33.890 NZST foreman 2014-08-14 23:00:10 NZST foreman LOG: duration: 2283.587 ms statement: SELECT 1 AS one FROM "trend_counters" WHERE ("trend_counters"."created_at" = '2014-08-14 11:00:14.286739' AND "trend_counters"."trend_id" = 1146) LIMIT 1

Explaining some of the queries, they didn't seem to be using the indexes efficiently.

QUERY PLAN                                                
----------------------------------------------------------------------------------------------------------
 Limit  (cost=174.34..26198.90 rows=1 width=0)
   ->  Bitmap Heap Scan on trend_counters  (cost=174.34..26198.90 rows=1 width=0)
         Recheck Cond: (trend_id = 1233)
         Filter: (created_at = '2014-08-14 11:00:14.286739'::timestamp without time zone)
         ->  Bitmap Index Scan on index_trend_counters_on_trend_id  (cost=0.00..174.34 rows=7820 width=0)
               Index Cond: (trend_id = 1233)
(6 rows)

And when we try to remove the trend, we got:

DELETE FROM "trend_counters" WHERE "trend_counters"."trend_id" IN (8, 9, 10, 11, 1 ........ )

And explaining it,

Delete on trend_counters (cost=5945.25..207753.75 rows=271905 width=6)
-> Bitmap Heap Scan on trend_counters (cost=5945.25..207753.75 rows=271905 width=6)
Recheck Cond: (trend_id = ANY ('{8,9,10,11,19,20,21,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41}'::integer[]))
-> Bitmap Index Scan on index_trend_counters_on_trend_id (cost=0.00..5877.28 rows=271905 width=0)
Index Cond: (trend_id = ANY ('{8,9,10,11,19,20,21,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41}'::integer[]))

We also saw:
SELECT 1 AS one FROM "trend_counters" WHERE ("trend_counters"."created_at" = '201
4-08-14 11:30:18.548358' AND "trend_counters"."trend_id" = 2260) LIMIT 1

It appears the code is not using the database efficiently, and is looping over records and using the "IN" syntax.

I suggest that we re-think the way trends work. The type of data lends itself to being stored in a graphing database, such as Graphite or RRD. My recommendation would be, to change trends so that instead of writing them to Postgresql, they are sent to a statsd target.. which can do whatever it wants, including sending them to graphite. The trends link would then include graphs from graphite.

To re-create the problems, populate trends with lots of data, then try to do stuff.

Actions

Copy link

Updated by Shimon Shtein over 10 years ago

Related to Bug #7505: Trends are not aggregrated added

Actions

Copy link

Updated by Shimon Shtein over 10 years ago

When #7505 will be merged, it should solve the issue, since it reduces the number of data points in the DB.

Besides it, "uptime" is a bit tricky to monitor using trends, because trends graphs are defined as "Number of hosts with selected fact, grouped by fact value".
So it will create meaningless groups each of them will contain a single host (or more, if by coincidence more than one server recorded the same uptime).

Actions

Copy link

Updated by The Foreman Bot about 10 years ago

Status changed from New to Ready For Testing
Pull request https://github.com/theforeman/foreman/pull/2365 added
Pull request deleted ()

Actions

Copy link

Updated by Jon McKenzie about 10 years ago

Ignore the referenced pull request, that was meant for issue #5568.

Actions

Copy link

Updated by Dominic Cleal about 10 years ago

Status changed from Ready For Testing to New
Pull request added
Pull request deleted (~~https://github.com/theforeman/foreman/pull/2365~~)

Actions

Copy link

Updated by Ohad Levy almost 10 years ago

Related to deleted (Bug #7505: Trends are not aggregrated)

Actions

Copy link

#10

Updated by Ohad Levy almost 10 years ago

Is duplicate of Bug #7505: Trends are not aggregrated added

Actions

Copy link

#11

Updated by Ohad Levy almost 10 years ago

Status changed from New to Duplicate

Actions

Copy link

Also available in: Atom PDF

	Related to Foreman - Bug #5568: Possible memory leak in trends	Closed	Jon McKenzie	05/05/2014			Actions
	Is duplicate of Foreman - Bug #7505: Trends are not aggregrated	Closed	Shimon Shtein	09/17/2014			Actions

Project

General

Profile

Custom queries

Bug #4114

Trends don't scale well and become very slow

Updated by Robert Birnie over 11 years ago

Updated by Dominic Cleal about 11 years ago

Updated by Glen Ogilvie almost 11 years ago

Updated by Shimon Shtein over 10 years ago

Updated by Shimon Shtein over 10 years ago

Updated by The Foreman Bot about 10 years ago

Updated by Jon McKenzie about 10 years ago

Updated by Dominic Cleal about 10 years ago

Updated by Ohad Levy almost 10 years ago

Updated by Ohad Levy almost 10 years ago

Updated by Ohad Levy almost 10 years ago