PulpV3GapAnalysis » History » Revision 29
Revision 28 (Justin Sherrill, 06/26/2018 02:48 AM) → Revision 29/40 (Justin Sherrill, 06/26/2018 01:21 PM)
h1. PulpV3GapAnalysis
h1. Content Tab
h2. Content -> Red Hat Repositories
Katello knows the content URLs from candlepin, matches on the CDN, presents them to the user, the user selects them
* Katello creates a Repo tracking this in Pulp with client certificates and CA certificate
* Katello specifies custom options from the 'Custom Repo Creation Page' but these use cases are covered in that section
Katello deletes a Repository
h2. Content -> Products
h3. Content -> Products -> New Product (used for things like CentOS, SLES, etc)
All data here is stored only in Katello since this is a Product not a Repository and Pulp doesn't have a concept of a Product
Sync Plans will *not* be handled inside of Pulp
h3. Content -> Products -> {product_name} -> Repositories
The user selects a type and content-specific fields are shown.
h4. Debian:
h5. Sync Options
* Upstream URL (str)
* Releases (csv list)
* Components (csv list)
* Architectures (csv list)
* Verify SSL (boolean)
* Upstream username (str)
* Upstream password (str)
* Ignore Global http Proxy (bool)
h5. Publish Options
* Publish via HTTP (bool) <----------------------------- PROBLEM AREA
h4. Docker
* Sync Options
* Ustream URL (str)
* Upstream Repository Name (str)
* Verify SSL (bool)
* Upstream username (str)
* Upstream password (str)
* Ignore Global http Proxy (bool)
h4. File
h5. Sync Options
* Upstream URL (str)
* Verify SSL (boolean)
* Upstream username (str)
* Upstream password (str)
* Ignore Global http Proxy (bool)
h5. Publish Options
* Publish via HTTP (bool) <----------------------------- PROBLEM AREA
h4. OSTree
h5. Sync Options
* Upstream URL (str)
* Upstream Sync Policy (choice): Latest Only, All History, Custom Depth (with a number specified) <--- in Pulp2 also specified on distributor
* Verify SSL (boolean)
* Upstream username (str)
* Upstream password (str)
* Ignore Global http Proxy (bool)
h4. Puppet
h5. Sync Options
* Upstream URL (str)
* Verify SSL (boolean)
* Upstream username (str)
* Upstream password (str)
* Mirror on Sync (boolean)
* Ignore Global http Proxy (bool)
h5. Publish Options
* Publish via HTTP (bool) <----------------------------- PROBLEM AREA
h4. Yum
h5. General Fields <------ not used by Pulp
* Restrict to Architecture (choice)
* GPG Key (str)
h5. Sync Settings
* Upstream URL (str)
* Ignorable Content (multiselect): RPM, DRPM, SRPM, Errata, Distribution
* Verify SSL (boolean)
* Upstream username (str)
* Upstream password (str)
* Download Policy (choice): (On Demand, Background, Immediate) <---- Background does not have a strong use case
* Mirror on Sync (bool)
* Ignore Global http Proxy (bool)
* SSL CA Cert (str)
* SSL Client Cert (str)
* SSL Client Key(str)
h5. Publish Settings
* Checksum: (choice) Default, sha256, sha1 <----- for all repodata including primary.xml
h3. Content -> Products -> {product_name} -> Repositories -> {repository_name}
This displays a created repository.
Katello allows the user to upload a package
* Receives the data from the user, sends it to Pulp
* Relies on Pulp to fully parse the metadata and create the unit <------- REQUIREMENT: must have Pulp determine all metadata
* Associates the the unit with the repository
Katello Reads a content Summary on this page
h5. Content -> Products -> {product_name} -> Repositories -> {repository_name} -> Select Action -> Sync Now
Katello tells the remote associated with the repository to sync
h5. Content -> Products -> {product_name} -> Repositories -> {repository_name} -> Select Action -> Advaced Sync
Katello can peroform an 'Advnaced Sync':
Optimized Sync - Normal sync, presented
Complete Sync - force-full on sync and force-full on publish <--------------------- GAP because we don't have force-full
Validate Content Sync - performs a checksum validation on all packages
* True Purpose: Validate existing downloaded content and redownload if the file(s) are missing or corrupt, redownload them. <-------- GAP
h5. Content -> Products -> {product_name} -> Repositories -> {repository_name} -> Select Action -> Republish Repository Metadata
Republishes the metadata.
* Katello would create a new Publication and update the Distribution
h5. Content -> Products -> {product_name} -> Repositories -> {repository_name} -> Select Action -> Delete a Repository
Deletes a repository
h3. Content -> Products -> {product_name} -> Repositories
This is the index view of all repositories
Repsitories in Katello can have the same name, but Pulp enforces a unique name on repositories globally <--------- GAP
Katello takes a Product ID which resolves to a set of repos. Katello fetches this set of repos. For each repo we need to fetch:
* name (str)
* type (str), e.g. 'yum'
* sync status, e.g. 'Not synced, Pending, Error' <------------------------- GAP this would require a second call to load the data per Remote
* Content Summary, e.g. 2 packages, 5 errata, etc. Similarly for other types.
Katello can trigger a sync of one or more Repositories at once.
* Trigger the sync on one or more Remotes as independant calls
Katello can trigger a delete of one or more Repositories at once.
* Trigger the delete call to Pulp as independant calls
Search/Filtering of the list of Repositories, for Repository attributes
* content_type: the type of content
* content_view_id: the id of the content View <-------- not in Pulp anywhere currently
* ignore_global_proxy <--------- GAP area, not currently in Pulp, but probably should be
* name
* product
* redhat <---------- Anything added from Red Hat "Products" page in Katello gets Red Hat.
Search/Filtering of the list of Repositories, for content units
* distribution_arch:
* distribution_bootable <----------- if Katello can detect if it has a vmlinuz init.rd it knows the distribution is bootable. Detected at the end of every sync.
* distribution_family
* distribution_uuid
* distribution_variant
* distribution_version
*NOTE: Must not have to make a call for each item in a list page. Must be able to make one call.*
h3. Content -> Products -> {product_name} -> Repositories -> {repository_name} -> Packages
Lists packages in a repository (the latest repository version)
Removing packages from the repository
* Can remove n packages from the repository
* Republish, Redistribute the repository
h2. Content -> Content Credentials
h3. Content -> Content Credentials -> GPG Keys
GPG keys can be created and stored by Katello
Pulp3 recommendation is to use pulp_file to hold the GPG keys hosted for clients to receive
h3. Content -> Content Credentials -> SSL Certificate (GAP. This whole section is a GAP b/c Pulp doesn't "host" SSL certs, you have to manually install them on the filesystem first)
Stores SSL certificates for use by Pulp at sync time as CA cert, client cert, or client key
* name
* value
Supports updating them
Support deleting them
Support searching them (name, organization_id)
SSL Certs are per-product, so Katello needs some way to restrict the set of available SSL certs for the current "product"
h2. Content -> Sync Plans
Sync plans will not be handled by Pulp 3, Katello/Foreman will handle scheduling.
h2. Content -> Sync Status
Show the most-recent sync status from dynflow data. That data is populated by task status results from Pulp, which needs to contain at a minimum:
* start time
* create time
* end time
* state
* progress reports
* fatal errors
* non-fatal errors
h2. Content -> Lifecycle Environments
Creates a lifecycle environment
* Does *not* involve Pulp
h3. Content -> Lifecycle Environments -> {name} -> Details
Each lifecycle environment has a 'Registry Name Pattern'. <------- GAP (specific to Docker only)
* Likely going to be on the Distributor
* Katello would use the template to produce a concrete value to set on the Distributor
* Important to ensure that two Distribution don't both receive the same concrete values
h3. Content -> Lifecycle Environments -> {name} -> Content Views
Filterable by:
* composite
* label
* name
* organization_id
h3. Content -> Lifecycle Environments -> {name} -> Yum Repositories
Content will come from CV section on Yum Repositories
h3. Content -> Lifecycle Environments -> {name} -> Errata
Content will come from CV section on Errata
h3. Content -> Lifecycle Environments -> {name} -> Packages
Content will come from CV section on Packages
h3. Content -> Lifecycle Environments -> {name} -> Puppet Modules
Content will come from CV section on Puppet Modules
h3. Content -> Lifecycle Environments -> {name} -> Container Image Tags
Content will come from CV section on Container Image Tags
h3. Content -> Lifecycle Environments -> {name} -> OSTree Branches
Content will come from CV section on OSTree Branches
h2. Content -> Content Views
h3. Content -> Content Views -> {name} -> Yum Repositories
List/Remove/Add one or more repositories to the Content View
* Does *not* involve Pulp
h3. Content -> Content Views -> {name} -> Yum Filters
Katello filters combine together (whitelist/blacklist/etc), and can be heavily modified by users to ultimately produce a set of packages. <------GAP: Katello would have to store huge lists of packages/errata to maintain this design.
h5. Package Filters
* Select RPMs using include or exclude filters to be included/disincluded from the content view.
* package name. Also supports wildcard. - an attribute of the metadata
* architecture. An attribute of the metadata
* version, lt, gt, range, etc. An attribute of the metadata
Checkbox with 'include all RPMs with no errata'. Solves a practical issue whereby packages that received no errata are not included in a content view when the user applies a filter that only includes packages referenced as errata.
h5. Package Group Filter
Select package groups to include or remove rpms
* name - an attribute of the metadata
* product - the katello stored attribute
* repository - the repo containing that unit
* description - an attribute of the metadata
h5. Errata by ID Filter <------ GAP: must be able to ask Pulp filter info and exclude
Filters to produce a list and then you can select from the list.
filterable on errata attributes
* type (multiselect) i.e. security, enhancement, bugfix
* date either or choice: i.e. updated on, Issued on w/ start/end date
* bug
* cve
* id
* issued
* package
* package_name
* reboot_suggested
* severity
* title
* type
* updated
h5. Errata by Date
Filters to produce a list. You *cannot* select from the list.
filterable on errata attributes
* type (multiselect) i.e. security, enhancement, bugfix
* date either or choice: i.e. updated on, Issued on w/ start/end date
h3. Content -> Content Views -> {name} -> Apt Repositories
No filtering. Add/remove Debian repositories from the content view.
h3. Content -> Content Views -> {name} -> File Repositories
No filtering. Add/remove file repositories from the content view.
h3. Content -> Content Views -> {name} -> Puppet Modules
Each module can only be included once. Can't have 2+ versions of the same module in one content view.
Attributes:
* Name
* Author
* Version
h3. Content -> Content Views -> {name} -> Container Images
Filtering is 'tag' based and used to produce a concrete set of image names.
h3. Content -> Content Views -> {name} -> OSTree Content
No filtering. Add/remove ostree repositories from the content view.
h3. Content -> Content Views -> {name} -> History
Not related to Pulp.
h3. Content -> Content Views -> {name} -> Tasks
Not related to Pulp.
h2. Content -> Content Views -> {name} -> Publish
The concrete content set from all filters is computed and those units are associated with the CV repositories.
Those content view repositories are then published via the Distributions that host Library.
h5. Promotion
Other "promotion" events will cause existing Publications to be exposed via existing/new Distributions associated with the lifecycle environment.
When delivering content to a capsule the "Force Yum Metadata Regernation" is used to cause Katello to inspect the published times of the repo on the main satellite server and the capsule. If nothing changed then don't "resync" the capsule's repo.
h5. Regenerate Repository Metadata
Causes Pulp to force-full publish.
h5. Incremental Update
Take an existing Content View and add/remove packages and errata w/ dependency resolution from the content set. Say a V1 exists, this would create a V1.1.
h5. Remove
Un-distributes and potentially delete the publications for one or more repos backing the content views.
h2. Content -> Deb Packages
List:
* Name
* Version
* Architecture
Filter options:
* architecture
* checksum
* filename
* name
* version
h5. Details tab
Details of the Debian package
h5. Repositories tab
List the Debian repositories containing the package
Filtering by:
* Lifecycle Environment
* Organization
h2. Content -> Container Image Tags
List:
* name
* available schema versions
* product name
* repository name
h3. Content -> Container Image Tags -> {name}
Display info about a tag
Displays Container Image Management.
Manifest type
checksum
h5. Lifecycle Environments
For each LE:
* Environment: environment name
* Content View Version: the CV and version
* Published At: the link the user can fetch the image from
h2. Content -> Errata
List Errata
* Errata ID
* Title
* Type
* Content Host Counts
* Updated, e.g. 1/27/12
Filtering booleans:
* applicable: An errata applies to a host, but it is not installable because all packages are not available via repos in its content views + lifecycle environment. This is computed against 'Library', which is the entire set of Errata in the system.
* installable: An errata applies to a host and all packages are available via its content views + lifecycle environments. This is filtered by the repos that the host is actually subscribed to.
Host filtering behavior:
* errata are only shown if they are applicable/installable to a host that I have access to via the Katello permission system
Filter attributes:
* bug
* cve
* id
* issued
* package
* package name
* reboot_suggested
* severity
* type, e.g. enhancement, security
* title
*updated
h3. Content -> Errata -> {errata_name}
h5. Details
Shows details of the erratum
h5. Content Hosts
List
* name
* OS
* environment
* content view
h5. Repositories
List
* name
* Product
* Last Sync
h3. Content -> Errata -> Select Content Hosts
Will publish an incremental update if the necessary packages are not. e.g. 2.1 built from the repoversion backing version 2
Also an option to trigger an update on the host after the publish is complete.
In Pulp 2 terms: copy repo, copy errata with recursive=true
Shows the errata and packages that were installed.
h2. Content -> Files
Filter by:
* checksum
* name
* repository
h3. Content -> Files -> {file_name}
h5. Details
Displays
* path
* checksum
h5. Library Repositories
List:
* name
* product
* last Sync, e.g. N/A, Not Synced, Success about 23 hours ago
Search: uses the same search syntax that all repository lists use
h5. Content Views
List:
* name
* environment
* version
Search: uses the same syntax as the content view version search i.e
* content_view_id
* repositroy
* version
h2. Content -> OSTree Branches
List:
* name
* version
Search:
* commit
* created
* name
* repository
* uuid
* version
h3. Content -> OSTree Branches -> {branch_name}
h5. Details
List:
* version
* commit
* date
h5. Repositories
List:
* Name
* Product
* Last Sync
Search: uses the same search syntax that all repository lists use
h2. Content -> Packages
Filtering booleans:
* applicable: A package applies to a host, but it is not upgradable because the newer version is not available via repos in its content views + lifecycle environment. This is computed against 'Library', which is the entire set of Packages in the system. A user only sees packages that are applicable to hosts the user has permission to read.
* upgradable: An older version of this package is installed on this host and a newer package is available via that host's content views + lifecycle environments. This is filtered by the repos that the host is actually subscribed to. A user only sees packages that are upgradable to hosts the user has permission to read.
List:
* rpm
* summary
* content host counts, i.e. X applicable, Y upgradable
Filter:
* arch
* checksum
* epoch
* filename
* name
* release
* sourcerpm
* version
h3. Content -> Packages -> {package_name}
h5. Details
Shows:
* installed on, i.e. host installed count
* Applicable to, i.e. host count
* Upgradeable for, i.e. host count
other package attributes
h5. Files
Lists the actual files
h5. Dependencies
List Requires packages
List Provides packages
h5. Repositories
List:
* name
* product
* last sync
Search: uses the same search syntax that all repository lists use
h2. Content -> Puppet Modules
List:
* author
* name
* summary
* version
Filter:
* author
* name
* summary
* version
h3. Content -> Puppet Modules -> {module_name}
h5. Details
Display:
* author
* version
* source
* Project Page, e.g. link
* license
* description
* summary
h5. Library Repositories
List:
* name
* product
* last Sync, e.g. N/A, Not Synced, Success about 23 hours ago
Search: uses the same search syntax that all repository lists use
h5. Content Views
List:
* name
* environment
* version
Search: uses the same syntax as the content view version search i.e
* content_view_id
* repository
* version
h1. Hosts -> Content Hosts
Applicability Info:
* counts of security errata
* counts of bugfix errata
* counts of enhancement errata
* count of package updates
Search by:
*applicable errata
* applicable rpms
* errata_status (up to date (green), non-security updates available (yellow), security updates available (red))
* installable errata
* installed package
* installed package name
* upgradeable rpms
h3. Hosts -> Content Hosts -> {host_name}
Displays Installable Errata data: counts for security, bugfix, enhancement
h1. Non UI things
* Upgrades
* Repository, errata, and rpm Apis
* the API endpoint that clients upload their enabled repos
* the API endpoint that clients upload their package profiles
* the API endpoint that clients register
* the API endpoint that clients unregister
* speed throttling and other global settings? <----------------------------- GAP for Pulp3. NOTE: one-time options would be really useful (in this case proxy options are important).
h5. Errata mailer. i.e., errata that are applicable and available to hosts the user has access to
* triggered when new errata are available in a content view.
* triggered when library receives new content via a sync
h5. smart proxy page/details
* storage system report <------------------------ GAP: desire to show storage used/available for Pulp's filesystem areas
* repo content unit summary counts (listen on the capsules)
* server status
h1. Terminology
Candlepin Manifest - Defines Products, Subscriptions, and a Content Sets
Product - A collection of repositories. A repository can only belong to one product
Repository Set - Has a name, Label, and URL of the form: /content/rhel/server/7/$RELVER/$BASEARCH/os/
h1. Core Problem Statements
h3. Labeling filtering issues - P4
Repositories, remotes, publishers, distributions, and all content types cannot be filtered as a group that belong together. An example of a group is an Organization, a Product, Content Views or a Lifecycle Environment in Katello.
A Repository belongs to a Product.
A Product belongs to an Organization.
Repositories can be added to Content Views.
A published Content View version belongs to a Lifecycle Environment.
h3. Filtering Issues - P4
Katello provides content views (CV). Think of a CV as a concrete set of packages selected using filters. These filters are applied to a repository that sync's remote content, e.g. RHEL7. The filters are whitelist and blacklist and are "stacked" on top of each other. The problem is Katello will have to store a huge number of references and then hand them back to Pulp to have the correct units associated with the content view. The problem is this puts a lot of data handling burden on Katello.
Repositories in Katello can have the same name, but Pulp enforces a unique name on repositories globally
h3. General Issues
Pulp does not have a way to validate all packages on-disk due to checksums. Even though Pulp may always write data correctly, bit rot is still a problem. - P2-YUM
When listing a repository, it does not show the last sync status. This is able to be gotten from knowing the last remote that was sync'd but you still have to make a second call to get it. It's a deal breaker to make N calls to Pulp to list info about N repositories/remotes. - P4
Pulp requires the user to physically place the ssl certs on the filesystem Pulp can read from. This prevents all-api usage of SSL configuraitons and phrohibits Pulp users who are admins from using that feature. HELP
Pulp doesn't support one-time options (such as sync with this http proxy). P1
Status API does not show available space on filesystem. P3
Katello needs to create a new repository version that is the same as an existing repository version so that they can then add/remove units from it. This is for the Katello "incremental content view update" feature. This is already planned as story: https://pulp.plan.io/issues/3360 P2-YUM/PUPPET
Concrete Recommendation for serving content over SSl and non-SSL - P1
SSL Protected Content. P2-YUM
Support Header-based auth (or recommended auth mechanism) - P1
Ability to specify a page size - HELP
Ability to use page numbers to move around in a list - P4
Selinux policy - P1
Upgrade procedure - P1
h1. Plugin Problem Statement Roundup
h3. RPM
pulp_rpm does not support a force-full option for sync. P2-YUM
h3. Docker
Each lifecycle environment has a 'Registry Name Pattern'. HELP
h1. Known problems we are deferring
Applicability is being discussed via meetings with pulp_rpm mini team and Katello beginning on 6/21.