Project

General

Profile

Bug #33367

Scoped search on /hosts with taxonomies very slow

Added by Nacho Barrientos about 2 months ago. Updated about 2 months ago.

Status:
Resolved
Priority:
Normal
Assignee:
-
Category:
Database
Target version:
-
Difficulty:
Triaged:
No
Bugzilla link:
Pull request:
Fixed in Releases:
Found in Releases:

Description

Hi,

We've moving from a rather old Foreman installation where organisations and locations were not a thing to one running 2.4.1. We've got around 39000 hosts and 6000 hostgroups in our database. All hosts belong to the same organisation and location for the time being and hence we only have one location and one organisation in the database.

A typical query our users do via the UI or the API is something like this:

/api/hosts/?search=hostgroup_fullname+~+some%2Fhostgroup&page=1

For a calling user that's non-admin and with some filters on the Host model the generated SQL is something like:

SQL (523472.7ms)  SELECT "hosts"."id" AS t0_r0, "hosts"."name" AS t0_r1, [...SNIP...]
   FROM "hosts" LEFT OUTER JOIN "hostgroups" ON "hostgroups"."id" = "hosts"."hostgroup_id" 
   AND (hostgroups.id IN (13,14,15,36,42,45,47,48,59,72,73,97,111,113,114,115,160,161,162,170,178,189,193,204,211,235,238,247,251,260,265,273,303,305,311,312,313,318,321,334,335,337,345,350,351,352,353,354,355,[...SNIP...],27966,1361,1413))
   WHERE ((("hostgroups"."title" ILIKE 'filter1/%') OR ("hostgroups"."title" = 'filter1') OR ("hostgroups"."title" ILIKE 'filter2/%') OR ("hostgroups"."title" = 'filter2'))) 
   AND "hosts"."type" = $1 
   AND "hosts"."organization_id" = $2 
   AND "hosts"."location_id" = $3 
   AND (("hostgroups"."title" ILIKE '%some/hostgroup%')) 
   ORDER BY "hosts"."name" ASC LIMIT $4 OFFSET $5

Parameters: [["type", "Host::Managed"], ["organization_id", 2], ["location_id", 1], ["LIMIT", 20], ["OFFSET", 0]] 

Bear in mind that the list of ids in the IN clause has been shrunk. This query is very slow with our data and it typically takes more than 60s.

However, if we remove the IN part which we believe comes from:

https://github.com/theforeman/foreman/blob/e2918f95f2c11bef3a287f63949387d47c771202/app/models/concerns/taxonomix.rb#L132

then the query is much faster.

Query plans:

Limit  (cost=22.54..22.54 rows=1 width=2184) (actual time=123.445..123.447 rows=0 loops=1)
   ->  Sort  (cost=22.54..22.54 rows=1 width=2184) (actual time=123.443..123.444 rows=0 loops=1)
         Sort Key: hosts.name
         Sort Method: quicksort  Memory: 25kB
         ->  Nested Loop  (cost=0.69..22.53 rows=1 width=2184) (actual time=123.383..123.384 rows=0 loops=1)
               ->  Index Scan using index_hosts_on_type_and_location_id on hosts  (cost=0.41..6.18 rows=1 width=1676) (actual time=0.041..29.470 rows=31717 loops=1)
                     Index Cond: (((type)::text = 'Host::Managed'::text) AND (location_id = 1))
                     Filter: (organization_id = 2)
               ->  Index Scan using hostgroups_pkey on hostgroups  (cost=0.28..8.31 rows=1 width=508) (actual time=0.003..0.003 rows=0 loops=31717)
                     Index Cond: (id = hosts.hostgroup_id)
                     Filter: (((title)::text ~~* '%some/hostgroup%'::text) AND (((title)::text ~~* 'filter1/%'::text) OR ((title)::text = 'filter1'::text) OR ((title)::text ~~* 'filter2/%'::text) OR ((title)::text = 'filter2'::text)))
                     Rows Removed by Filter: 1
 Planning Time: 5.226 ms
 Execution Time: 123.597 ms
(14 rows)
 Limit  (cost=2169.24..2169.24 rows=1 width=2184) (actual time=507892.607..507892.615 rows=0 loops=1)
   ->  Sort  (cost=2169.24..2169.24 rows=1 width=2184) (actual time=507892.604..507892.606 rows=0 loops=1)
         Sort Key: hosts.name
         Sort Method: quicksort  Memory: 25kB
         ->  Nested Loop  (cost=0.69..2169.23 rows=1 width=2184) (actual time=507892.483..507892.484 rows=0 loops=1)
               Join Filter: (hosts.hostgroup_id = hostgroups.id)
               ->  Index Scan using index_hosts_on_type_and_location_id on hosts  (cost=0.41..6.18 rows=1 width=1676) (actual time=0.052..102.077 rows=31717 loops=1)
                     Index Cond: (((type)::text = 'Host::Managed'::text) AND (location_id = 1))
                     Filter: (organization_id = 2)
               ->  Index Scan using hostgroups_pkey on hostgroups  (cost=0.28..2163.03 rows=1 width=508) (actual time=14.922..14.922 rows=0 loops=31717)
                     Index Cond: (id = ANY ('{13,14,15,36,42,45,47,48,59,72,73,97,111,113,114,115,160,161,162,170,178,189,193,204,211,235,238,247,251,260,265,273,303,305,311,312,313,318,321,334,335,337,345,350,351,352,353,354,355,[...SNIP...]27966,1361,1413}'::integer[]))
                     Filter: (((title)::text ~~* '%some/hostgroup%'::text) AND (((title)::text ~~* 'filter1/%'::text) OR ((title)::text = 'filter1'::text) OR ((title)::text ~~* 'filter2/%'::text) OR ((title)::text = 'filter2'::text)))
                     Rows Removed by Filter: 6252
 Planning Time: 15.623 ms
 Execution Time: 507892.913 ms

Unfortunately this kind of queries is quite frequent and once we allow this traffic in, Foreman rapidly clogs and all the application servers serving these requests become rather inoperative.

We could patch locally to get rid of the scope added by the taxonomies as at the moment we'll live with only one organisation and one location, however looking to the future this option sounds very scary.

History

#1 Updated by Nacho Barrientos about 2 months ago

We noticed that on the Foreman instance where we had prepared the upgrade the query was fast as Postgres was generating a different query plan with a similar dataset:

 Limit  (cost=2755.76..2755.81 rows=20 width=32) (actual time=14.131..14.133 rows=3 loops=1)
   ->  Sort  (cost=2755.76..2755.87 rows=47 width=32) (actual time=14.128..14.129 rows=3 loops=1)
         Sort Key: hosts.name
         Sort Method: quicksort  Memory: 25kB
         ->  Nested Loop  (cost=4.80..2754.50 rows=47 width=32) (actual time=1.113..14.059 rows=3 loops=1)
               ->  Index Scan using hostgroups_pkey on hostgroups  (cost=0.28..2169.89 rows=10 width=12) (actual time=1.078..13.700 rows=45 loops=1)
                     Index Cond: (id = ANY ('{13,14,15,36,42,45,47,48,59,72,73,97,111,113,114,115,160,161,162,170,178,189,193,204,211,235,238,247,251,260,265,273,303,305,311,312,313,318,321,334,335,337,345,350,351,352,353,354,355,[...SNIP...],27966,1361,1413}'::integer[]))
                     Filter: (((title)::text ~~* '%some/hostgroup%'::text) AND (((title)::text ~~* 'filter1/%'::text) OR ((title)::text = 'filter1'::text) OR ((title)::text ~~* 'filter2/%'::text) OR ((title)::text = 'filter2'::text)))
                     Rows Removed by Filter: 5996
               ->  Bitmap Heap Scan on hosts  (cost=4.52..58.32 rows=14 width=28) (actual time=0.006..0.007 rows=0 loops=45)
                     Recheck Cond: (hostgroup_id = hostgroups.id)
                     Filter: (((type)::text = 'Host::Managed'::text) AND (organization_id = 2) AND (location_id = 1))
                     Heap Blocks: exact=3
                     ->  Bitmap Index Scan on host_group_id_ix  (cost=0.00..4.52 rows=14 width=0) (actual time=0.006..0.006 rows=0 loops=45)
                           Index Cond: (hostgroup_id = hostgroups.id)
 Planning Time: 8.116 ms
 Execution Time: 14.226 ms

As you can see this approach to retrieve the data is much better.

Executing a manual VACUUM ANALYZE on hosts and hostgroups on the "bad" DB to generate DB statistics brought "back" the good query plan to this other database.

foredb=> vacuum analyze hosts;
VACUUM
foredb=> vacuum analyze hostgroups;
VACUUM
foredb=> quit

Guess the ticket can be closed, although we're scared about the fact that the default autovacuum threshold could not be good enough for our dataset and the problem might come back as the runtime increases.

#2 Updated by Tomer Brisker about 2 months ago

  • Status changed from New to Resolved
  • Category set to Database

Marking as resolved for now.
We do have some custom options for PostgreSQL auto-vacuum (as well as other options) in the installer tuning profiles (https://github.com/theforeman/foreman-installer/tree/develop/config/foreman.hiera/tuning/), if you find any other options that can help in large setups like yours, please feel free to re-open and propose what changes make sense.

Also available in: Atom PDF