Bug #22076: [improvement] Make errata list unique in repository empty_errata for psql select - Katello - Foreman

Actions

Copy link

Bug #22076

closed

[improvement] Make errata list unique in repository empty_errata for psql select

Added by Pavel Moravec over 7 years ago. Updated almost 7 years ago.

Status:

Rejected

Priority:

Normal

Assignee:

Category:

Target version:

Katello Recycle Bin

Difficulty:

Triaged:

Bugzilla link:

Pull request:

Fixed in Releases:

Found in Releases:

Red Hat JIRA:

Description

In https://github.com/Katello/katello/blob/master/app/models/katello/repository.rb#L291, errata_with_packages is a set with many duplicates - this can be visible by enabling sql debugs, publishing a CV and grepping for "katello_errata.id NOT IN" in production.log.

Does it make sense to make the set uniq before passing it as "NOT IN" option to postgres? Wont it improve performance? I mean trivial patch like:

--- repository.rb    2017-12-26 14:49:55.133400485 +0100
+++ repository.rb.new    2017-12-26 14:51:39.772839567 +0100
@@ -293,7 +293,7 @@ module Katello
         where("#{repository_errata}.repository_id" => self.id)

       if errata_with_packages.any?
-        self.errata.where("#{Katello::Erratum.table_name}.id NOT IN (?)", errata_with_packages.pluck("#{errata}.id"))
+        self.errata.where("#{Katello::Erratum.table_name}.id NOT IN (?)", errata_with_packages.pluck("#{errata}.id").uniq)
       else
         self.errata
       end

I tried this on my foreman/katello but got bit contradicting results so far - sometimes the uniq made by ruby takes overall command faster, sometimes the original behaviour is faster, times seem to vary a lot.

Actions

Copy link

Updated by Pavel Moravec over 7 years ago

It seems the current behaviour without "uniq" beats the "improved" one due to some caching done - very first request of some repo is worse than in "uniq" case, but later requests are super-fast (I guess due to ruby/rails caching the postgres response that does not happen for the "uniq" case).

TL;DR the improvement does not seem to bring performance gain, hence not worth using it.

Particular times in milliseconds e.g. on CV with RHEL7 repo:

original:
148088
262
377
174
124
129
245
155
119
110

uniq:
143707
140327
1052
20466
1208
142378
432
146043
142114
21583