Feature #16938
closedKatello services to be configured to restart on failure
Description
Cloned from https://bugzilla.redhat.com/show_bug.cgi?id=1310111
Description of problem:
Request:
configure Sat6 essential services to automatically restart on failure.
Reasoning:
Sat6 relies on various services that are essential for various functionality of the product. If such a service fails due to whatever reason (say, segfault), the functionality is temporarily disabled until an administrator intervention. That often comes only at the end of the sequence: some service failed -> some functionality doesnt work -> customer not notified / doesnt check logs -> after some time, they realize the functionality does not work -> raising support case to Red Hat -> takes time for us to identify the cause -> service restarted.
The functionality downtime and Red Hat support intervention is ridiculously high.
(Sat6 health-check script would alleviate this pain, to some extend. But even with that, the request will still be valid. Technically health-check script is just a different for of logs that doesnt restart failed service itself)
On technical level:
- not sure if applicable to RHEL6 where manual changes to each and every init script would have to be done. I am ok doing so for RHEL7 and updating systemd config only
- ideally, systemd service should be configured to restart any failed/killed/.. service several times in a row and then give up - or optionally try to restart the service with some nontrivial delay between the attempts
- essential/critical services: basically to cover "katello-service status" services
Version-Release number of selected component (if applicable):
6.1.6
How reproducible:
100%
Steps to Reproduce:
1. Mimic a service failure by killing it (an example: kill qdrouterd)
2. Wait some time to allow Sat to reheal
3. Ty the failed functionality (an example: install some errata that relies on qdrouterd)
Actual results:
3. fails regardless of the delay in 2.
Expected results:
3. to succeed after some time without any intervention
Additional info:
Updated by Justin Sherrill about 8 years ago
- Subject changed from Katello services to be configured to restart on failure to Katello services to be configured to restart on failure
- Translation missing: en.field_release set to 114
Updated by Ewoud Kohl van Wijngaarden over 1 year ago
- Status changed from New to Rejected
- Triaged set to No