Feature #16938

Katello services to be configured to restart on failure

Added by Stephen Benjamin over 5 years ago. Updated almost 4 years ago.

Target version:
Bugzilla link:
Pull request:
Fixed in Releases:
Found in Releases:
Red Hat JIRA:


Cloned from
Description of problem:
configure Sat6 essential services to automatically restart on failure.

Sat6 relies on various services that are essential for various functionality of the product. If such a service fails due to whatever reason (say, segfault), the functionality is temporarily disabled until an administrator intervention. That often comes only at the end of the sequence: some service failed -> some functionality doesnt work -> customer not notified / doesnt check logs -> after some time, they realize the functionality does not work -> raising support case to Red Hat -> takes time for us to identify the cause -> service restarted.

The functionality downtime and Red Hat support intervention is ridiculously high.

(Sat6 health-check script would alleviate this pain, to some extend. But even with that, the request will still be valid. Technically health-check script is just a different for of logs that doesnt restart failed service itself)

On technical level:
- not sure if applicable to RHEL6 where manual changes to each and every init script would have to be done. I am ok doing so for RHEL7 and updating systemd config only
- ideally, systemd service should be configured to restart any failed/killed/.. service several times in a row and then give up - or optionally try to restart the service with some nontrivial delay between the attempts
- essential/critical services: basically to cover "katello-service status" services

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Mimic a service failure by killing it (an example: kill qdrouterd)
2. Wait some time to allow Sat to reheal
3. Ty the failed functionality (an example: install some errata that relies on qdrouterd)

Actual results:
3. fails regardless of the delay in 2.

Expected results:
3. to succeed after some time without any intervention

Additional info:


#1 Updated by Justin Sherrill over 5 years ago

  • Subject changed from Katello services to be configured to restart on failure to Katello services to be configured to restart on failure
  • Legacy Backlogs Release (now unused) set to 114

Also available in: Atom PDF