very slow publishing of a content view with filters containing many errata
Description of problem:
Having a Content View with a filter that includes or excludes thousands of errata, an attempt to publish the CV takes too much time (i.e. 2+ minutes per each CV's repo with the filter applied).
As an example, having a CV with 10 repos with such filters, it takes approx. 30 minutes of planning the task (and then just few minutes to execute it, incld. CopyRpm or DistributorPublish).
That is bad from two reasons:
1) overall performance is bad (because planning a task takes several times more than executing the task)
2) user sees practically nothing for most of the task lifecycle. When interested why the task publish takes so long, task details are empty (since task is still in planning).
Particular code that takes so long:
that is this method:
The method is inefficient for arguments with thousands of errata in it.
Today, there exists a workaround in using opposite filtering (i.e. instead of "include all errata older than month ago", use "exclude any newer errata" (and deal with pkgs outside errata). However this workaround will be less and less applicable as the overall number of errata in a repo will grow over time.
Steps to Reproduce:
1. Have synced several bigger repos with many errata
2. Create a CV, add there all the repos.
3. Add a filter "include all errata older than date X.Y." such that the date is just a month old / to include most of errata in the CV
4. Click to publish the CV
5. Check how long it will take to publish the CV (and when the task will leave planning phase / will start executing the very first step)
5. CV publish takes 30+ minutes, most of the time is spent in planning (very first dynflow step is kicked off after a long time)
5. Some reasonable lower planning phase
Just add some debugging statements just around the line https://github.com/Katello/katello/blob/master/app/lib/actions/katello/repository/clone_yum_content.rb#L17 to see the delay is right there.
NOTE the main issue is 'planning' of the task is taking a very long time. Not the task to the run the copy in itself.