Bug #22076
closed[improvement] Make errata list unique in repository empty_errata for psql select
Description
In https://github.com/Katello/katello/blob/master/app/models/katello/repository.rb#L291, errata_with_packages is a set with many duplicates - this can be visible by enabling sql debugs, publishing a CV and grepping for "katello_errata.id NOT IN" in production.log.
Does it make sense to make the set uniq before passing it as "NOT IN" option to postgres? Wont it improve performance? I mean trivial patch like:
--- repository.rb 2017-12-26 14:49:55.133400485 +0100 +++ repository.rb.new 2017-12-26 14:51:39.772839567 +0100 @@ -293,7 +293,7 @@ module Katello where("#{repository_errata}.repository_id" => self.id) if errata_with_packages.any? - self.errata.where("#{Katello::Erratum.table_name}.id NOT IN (?)", errata_with_packages.pluck("#{errata}.id")) + self.errata.where("#{Katello::Erratum.table_name}.id NOT IN (?)", errata_with_packages.pluck("#{errata}.id").uniq) else self.errata end
I tried this on my foreman/katello but got bit contradicting results so far - sometimes the uniq made by ruby takes overall command faster, sometimes the original behaviour is faster, times seem to vary a lot.
Updated by Pavel Moravec over 6 years ago
It seems the current behaviour without "uniq" beats the "improved" one due to some caching done - very first request of some repo is worse than in "uniq" case, but later requests are super-fast (I guess due to ruby/rails caching the postgres response that does not happen for the "uniq" case).
TL;DR the improvement does not seem to bring performance gain, hence not worth using it.
Particular times in milliseconds e.g. on CV with RHEL7 repo:
original:
148088
262
377
174
124
129
245
155
119
110
uniq:
143707
140327
1052
20466
1208
142378
432
146043
142114
21583
Updated by Eric Helms over 6 years ago
- Translation missing: en.field_release set to 166