Bug #19053
closedProxy continues to write to deleted file after log rotation
Description
After logrotate
runs both foreman-proxy
and squid
continue to write to the rotated log, which is a deleted file handle.
$ sudo lsof /var/log | grep deleted squid 1357 root 5u REG 253,6 18393 2117707 /var/log/squid/cache.log-20170327 (deleted) squid 1359 squid 5u REG 253,6 18393 2117707 /var/log/squid/cache.log-20170327 (deleted) ruby 1792 foreman-proxy 7w REG 253,6 21154 1048669 /var/log/foreman-proxy/proxy.log-20170325 (deleted)
This happens both on the Katello host and any Capsules.
I have tried manually running /bin/systemctl kill --signal=SIGUSR1 foreman-proxy >/dev/null 2>&1 || true
(which is the postrotate
script) but the file is not released and new file is not written to.
CentOS 7.3
Katello 3.2.2
Foreman 1.13.4
Squid 3.5.20
Logrotate 3.8.6
The only modification to /etc/logrotate.conf
from the RPM-provided file is that compress
has been uncommented.
Updated by Justin Sherrill almost 8 years ago
- Project changed from Katello to Smart Proxy
- Category set to Packaging
- Priority changed from Urgent to High
The squid logrotate file appears to come from the squid rpm, so you may want to file a bug against centos/RHEL for that.
Moving this to smart-proxy packaging to handle the smart proxy logging.
Updated by Jason Nance almost 8 years ago
Regarding the Squid LogRotate, this seems to be specific to the way Foreman is deploying it as I have non-Foreman Squid instances running on the same OS version in the same hosted environment which do not suffer from this issue.
Updated by Anonymous almost 8 years ago
Could you check the system log to see if there any logrotate-related errors in it? Out of curiosity, is /var/lib filesystem mounted as RO? If so, please see https://bugzilla.redhat.com/show_bug.cgi?id=1272236.
Updated by Jason Nance almost 8 years ago
I don't see any helpful logs. Here's what I can find:
Apr 03 03:06:01 sl1mmgplsat0001.ipa.gen.zone anacron[114112]: Job `cron.daily' started Apr 03 03:06:02 sl1mmgplsat0001.ipa.gen.zone run-parts(/etc/cron.daily)[116790]: starting logrotate Apr 03 03:06:05 sl1mmgplsat0001.ipa.gen.zone logrotate[116798]: ALERT exited abnormally with [1] Apr 03 03:06:05 sl1mmgplsat0001.ipa.gen.zone run-parts(/etc/cron.daily)[116800]: finished logrotate Apr 03 03:06:05 sl1mmgplsat0001.ipa.gen.zone run-parts(/etc/cron.daily)[116802]: starting man-db.cron Apr 03 03:06:21 sl1mmgplsat0001.ipa.gen.zone run-parts(/etc/cron.daily)[116866]: finished man-db.cron Apr 03 03:06:21 sl1mmgplsat0001.ipa.gen.zone anacron[114112]: Job `cron.daily' terminated (mailing output) Apr 03 03:06:23 sl1mmgplsat0001.ipa.gen.zone anacron[114112]: Normal exit (1 job run)
/var/lib
isn't a read-only filesystem (it is just a directory in /var
, which is also not read-only):
/dev/mapper/vg_root-lv_var on /var type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg_root-lv_root 4.0G 2.4G 1.6G 60% / devtmpfs 12G 0 12G 0% /dev tmpfs 12G 12K 12G 1% /dev/shm tmpfs 12G 137M 12G 2% /run tmpfs 12G 0 12G 0% /sys/fs/cgroup /dev/sda1 487M 156M 302M 35% /boot /dev/mapper/vg_root-lv_var 75G 35G 41G 47% /var /dev/mapper/vg_root-lv_opt 1014M 476M 539M 47% /opt /dev/mapper/vg_root-lv_tmp 1014M 135M 880M 14% /tmp /dev/mapper/vg_root-lv_home 1014M 34M 981M 4% /home /dev/mapper/vg_root-lv_varlog 5.0G 2.4G 2.7G 48% /var/log /dev/mapper/vg_root-lv_varlibmongodb 75G 34G 42G 45% /var/lib/mongodb /dev/mapper/vg_root-lv_varlogaudit 1014M 64M 951M 7% /var/log/audit
drwxr-xr-x. 46 root root 4096 Mar 24 09:59 /var/lib
Running the postrotate
in /etc/logrotate.d/foreman-proxy
manually doesn't release the open file handler, either:
# lsof /var/log/ | grep delete squid 1357 root 5u REG 253,6 18393 2117707 /var/log/squid/cache.log-20170327 (deleted) squid 1359 squid 5u REG 253,6 18393 2117707 /var/log/squid/cache.log-20170327 (deleted) ruby 1792 foreman-proxy 7w REG 253,6 33954 1048669 /var/log/foreman-proxy/proxy.log-20170325 (deleted) # ps auxww | grep 1792 foreman+ 1792 0.0 0.1 724000 43288 ? Sl Mar24 1:44 ruby /usr/share/foreman-proxy/bin/smart-proxy # systemctl kill --signal=SIGUSR1 foreman-proxy # lsof /var/log/ | grep delete squid 1357 root 5u REG 253,6 18393 2117707 /var/log/squid/cache.log-20170327 (deleted) squid 1359 squid 5u REG 253,6 18393 2117707 /var/log/squid/cache.log-20170327 (deleted) ruby 1792 foreman-proxy 7w REG 253,6 33954 1048669 /var/log/foreman-proxy/proxy.log-20170325 (deleted)
I noticed you change the priority on this report. Are you unable to reproduce this? 'Cause I would think that a bug which requires you to restart the application to release a deleted file handle would be high priority...
Updated by Anonymous over 7 years ago
Log rotation works as expected on my machine (fedora 25, logrotate 3.10.0).
Smart-proxy re-opens its log file on the first access to the log after it received a SIGUSR1. Try accessing smart-proxy (curl https://localhost:8443/version should do the trick) and check if it has switched to the new log file. If you are running logrotate manually, make sure you aren't using it in dry-run mode (i.e "-d" switch).
Updated by Jason Nance over 7 years ago
I'm not running logrotate
manually, I was just running the postscript (sending the SIGUSR1
) manually to take logrotate
out of the picture for troubleshooting. Sending the SIGUSR1 isn't working for me.
I have also tried sending the SIGUSR1
using kill
instead of systemctl
and foreman-proxy
isn't releasing the log file:
[root@sl1mmgplsat0001 ~]# lsof /var/log/ | grep delete squid 1357 root 5u REG 253,6 18393 2117707 /var/log/squid/cache.log-20170327 (deleted) squid 1359 squid 5u REG 253,6 18393 2117707 /var/log/squid/cache.log-20170327 (deleted) ruby 1792 foreman-proxy 7w REG 253,6 33954 1048669 /var/log/foreman-proxy/proxy.log-20170325 (deleted) [root@sl1mmgplsat0001 ~]# kill -s SIGUSR1 1792 [root@sl1mmgplsat0001 ~]# lsof /var/log/ | grep delete squid 1357 root 5u REG 253,6 18393 2117707 /var/log/squid/cache.log-20170327 (deleted) squid 1359 squid 5u REG 253,6 18393 2117707 /var/log/squid/cache.log-20170327 (deleted) ruby 1792 foreman-proxy 7w REG 253,6 33954 1048669 /var/log/foreman-proxy/proxy.log-20170325 (deleted)
What other info can I provide to help track this down?
Updated by Anonymous over 7 years ago
Sending the SIGUSR1 isn't working for me
Did you read the explanation of how smart-proxy handles log rotation? Did you try accessing it after sending SIGUSR1 to it?
Updated by Jason Nance over 7 years ago
Yes, I saw. My Smart Proxy is fairly actively used and as you can see from the output that file has been opened since around March 24. I'm 100% certain that the Smart Proxy has been accessed multiple times in the past 10 days.
Outside of the /version URL (which 404s for me, BTW), what else should trigger this (hosts checking in, content sync, content view changes)?
I tried browsing to Infrastructure -> Smart Proxies and then clicking around on the Smart Proxy tabs and that seemed to have worked for the Smart Proxy on my Katello server (where I just ran the kill
), but doing the same for my Capsule (without manually running the kill
) didn't work. Is it possible that there is a time limit on this or something?
Updated by Anonymous over 7 years ago
Outside of the /version URL (which 404s for me, BTW), what else should trigger this
/version should always be accessible, via http, https, or both. If it's not available then smart-proxy isn't up. Any request to smart proxy should be sufficient, esp. if you have debug-level logging enabled (I'm not sure if listing proxies in Foreman is sufficient though). I'm not sure if any of content-related operations would suffice either, as they are handled by pulp with little involvement from proxy. To make sure, try getting a response from /version or /features urls. Depending on the configuration, you may need to use correct ip address/fqdn/protocol or combination of thereof.
Is it possible that there is a time limit on this or something?
Not sure what you mean by this. When SIGUSR1 is caught, an internal flag signalling the need to reopen the log file is raised. It is reset when (and only then) the log file has been reopened.
Updated by Jason Nance over 7 years ago
Dmitri Dolguikh wrote:
Outside of the /version URL (which 404s for me, BTW), what else should trigger this
/version should always be accessible, via http, https, or both. If it's not available then smart-proxy isn't up.
On my Katello host:
# wget --no-check-certificate https://localhost:8443/version --2017-04-04 10:36:01-- https://localhost:8443/version Resolving localhost (localhost)... ::1, 127.0.0.1 Connecting to localhost (localhost)|::1|:8443... connected. WARNING: certificate common name ‘sl1mmgplsat0001.ipa.gen.zone’ doesn't match requested host name ‘localhost’. HTTP request sent, awaiting response... 404 Not Found 2017-04-04 10:36:01 ERROR 404: Not Found.
On my Capsule:
# wget --no-check-certificate https://localhost:8443/version --2017-04-04 10:38:01-- https://localhost:8443/version Resolving localhost (localhost)... ::1, 127.0.0.1 Connecting to localhost (localhost)|::1|:8443... connected. WARNING: no certificate subject alternative name matches requested host name ‘localhost’. HTTP request sent, awaiting response... 404 Not Found 2017-04-04 10:38:01 ERROR 404: Not Found.
I also tried replacing "localhost" with the FQDN of the server and tried substituting "/version" with "/features". :-\
Is it possible that there is a time limit on this or something?
Not sure what you mean by this. When SIGUSR1 is caught, an internal flag signalling the need to reopen the log file is raised. It is reset when (and only then) the log file has been reopened.
What I meant is if that re-open had to be triggered in a certain time period else the flag gets cleared. It seems that this flag either isn't getting set or isn't getting triggered.
Updated by Anonymous over 7 years ago
re: 404's -- check smart-proxy configuration (in /etc/foreman-proxy/settings.yml), it will have ip address(es) and ports on which proxy is listening.
What I meant is if that re-open had to be triggered in a certain time period else the flag gets cleared. It seems that this flag either isn't getting set or isn't getting triggered.
No, there's no time limit or anything of the sort. The mechanics of catching the signal is trivial too (and the issue has never been reported before), so I'm very sceptical this is a bug. Based on description of how the capsule is being used, I don't think you are hitting smart-proxy itself (pulp is a separate process), and it's quite possible that the log file simply hasn't been reopened. Once you get a response from /version or /features urls, please check if the log file has been reopened.
Updated by Jason Nance over 7 years ago
Okay, the correct URL is:
https://localhost:9090/version
However, this does not trigger the rotation:
# wget -O- --no-check-certificate https://localhost:9090/version --2017-04-06 16:42:23-- https://localhost:9090/version Resolving localhost (localhost)... ::1, 127.0.0.1 Connecting to localhost (localhost)|::1|:9090... failed: Connection refused. Connecting to localhost (localhost)|127.0.0.1|:9090... connected. WARNING: no certificate subject alternative name matches requested host name ‘localhost’. HTTP request sent, awaiting response... 200 OK Length: 185 [application/json] Saving to: ‘STDOUT’ 0% [ ] 0 --.-K/s {"version":"1.13.4","modules":{"pulp":"1.3.0","openscap":"0.6.0","dynflow":"0.1.5","ssh":"0.1.4","tftp":"1.13.4","puppetca":"1.13.4","puppet":"1.13.4","realm":"1100%[======================================>] 185 --.-K/s in 0.04s 2017-04-06 16:42:23 (4.60 KB/s) - written to stdout [185/185] # lsof /var/log/ | grep delete ruby 1792 foreman-proxy 7w REG 253,6 12434 1050864 /var/log/foreman-proxy/proxy.log-20170405 (deleted)
Updated by Jason Nance over 7 years ago
Jason Nance wrote:
However, this does not trigger the rotation:
I meant to say "the re-open". I understand that this isn't actually rotating the logs. The logrotate script is doing that.
Updated by Anonymous over 7 years ago
I'm not sure what's going on there. I still can't reproduce this, and I don't even see ruby vm holding log files open (lsof shows nothing). I would suggest asking a question about this in #theforeman irc channel, perhaps other users has seen this issue before?
Updated by Jason Nance over 7 years ago
I'll ask in IRC but I previously posted on the email list without a response.
Do you have an EL 7 box with Katello laying around to check this? Between production and lab I have 3 Katello hosts and 1 Capsule and all 4 instances are experiencing this. Of course, they are all on CentOS 7 and have the same plugins enabled. Is it possible that it is related to the Ruby version or a plugin?
Updated by Anonymous over 7 years ago
Do you have an EL 7 box with Katello laying around to check this?
Not at the moment. I'm not sure what the issue there is; I couldn't replicate this with different versions of ruby (2.0.0 and 2.3.1). If you are seeing this with Katello too (which runs in Rails in a separate from smart-proxy VM), then it would suggest that the issue is outside of smart-proxy. You mentioned that you are seeing the same problem with squid, which would suggest that the issue isn't with ruby VM, but elsewhere (logrotate perhaps?).
Updated by Anonymous over 7 years ago
Try sending SIGUSR1 to smart-proxy via kill (as opposed to through systemctl)? See if that works any better?
Updated by Jason Nance over 7 years ago
It's better in a way...
[root@sl2mmgplsat0001 ~]# lsof /var/log | grep deleted ruby 121892 foreman-proxy 7w REG 253,5 1043 1049919 /var/log/foreman-proxy/smart_proxy_dynflow_core.log-20170407 (deleted) ruby 122001 foreman-proxy 7w REG 253,5 13744 1049879 /var/log/foreman-proxy/proxy.log-20170407 (deleted) [root@sl2mmgplsat0001 ~]# /bin/systemctl kill --signal=SIGUSR1 foreman-proxy [root@sl2mmgplsat0001 ~]# lsof /var/log | grep deleted ruby 121892 foreman-proxy 7w REG 253,5 1043 1049919 /var/log/foreman-proxy/smart_proxy_dynflow_core.log-20170407 (deleted)
The old proxy.log
was released, but not the smart_proxy_dynflow_core.log
(which I don't recall seeing before). Is there a different way to trigger the reopen of smart_proxy_dynflow_core.log
?
I did, however, finally find this in the audit.log
from early this morning:
type=USER_AVC msg=audit(1491468305.905:16288): pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='avc: denied { stop } for auid=0 uid=0 gid=0 path="/usr/lib/systemd/system/foreman-proxy.service" cmdline="/bin/systemctl kill --signal=SIGUSR1 foreman-proxy" scontext=system_u:system_r:logrotate_t:s0-s0:c0.c1023 tcontext=system_u:object_r:systemd_unit_file_t:s0 tclass=service exe="/usr/lib/systemd/systemd" sauid=0 hostname=? addr=? terminal=?'
It looks like SELinux is preventing logrotate
from sending the kill
. For some reason that wasn't in the previous audit.log
(or I missed it).
So assuming there is a different way to trigger the reopen of smart_proxy_dynflow_core.log
this looks to be an SELinux issue. If there isn't a different way to trigger the reopen of smart_proxy_dynflow_core.log
it looks like I may have two separate issues.
Updated by Anonymous over 7 years ago
I took a quick look at dynflow_core, but not sure if/how they handle log rotation. The easiest thing would be to ask this question in #theforeman.
Updated by Anonymous over 7 years ago
- Tracker changed from Bug to Support
- Category changed from Packaging to Core
I'm going to change this to support; Please file file separate bug reports with Foreman SELinux and/or Smart-Proxy Dynflow module (if you think the issues above are bugs).
Updated by Jason Nance over 7 years ago
- Copied to Bug #19223: AVC denied from logrotate sending signal to foreman-proxy unit added
Updated by Jason Nance over 7 years ago
- Copied to Bug #19224: smart_proxy_dynflow_core.log not reopened during logrotate added
Updated by Lukas Zapletal over 7 years ago
- Tracker changed from Support to Bug
- Status changed from Resolved to Assigned
- Assignee set to Lukas Zapletal
So the problem here is the standard practice is to send SIGHUP to daemons to release log files after rotating, base RHEL SELinux policy includes rules to allow logrotate to send this via systemctl reload command. Our proxy only supports SIGUSR1 which is not standard and therefore the only way supporting this is by doing changes in our SELinux policy. This includes creating two types, domain, macro, file contexts - this is too complicated. It is much better to follow what is expected and change our service to respond to reload properly, then we can change our logrotate script as well.
Therefore I am opening this and making the change in proxy to properly reopen logs when "reload" init command is sent.
Updated by The Foreman Bot over 7 years ago
- Status changed from Assigned to Ready For Testing
- Pull request https://github.com/theforeman/smart-proxy/pull/544 added
Updated by Lukas Zapletal over 7 years ago
- Pull request https://github.com/theforeman/foreman-packaging/pull/1814 added
Two PRs are associated with this one.
Updated by Lukas Zapletal over 7 years ago
- Project changed from Smart Proxy to SELinux
- Category deleted (
Core) - Status changed from Ready For Testing to New
- Pull request deleted (
https://github.com/theforeman/foreman-packaging/pull/1814, https://github.com/theforeman/smart-proxy/pull/544)
Unfortunately, proxy folks insist on using USR1 signal, therefore the only way is to modify our SELinux policy and introduce new domain.
https://github.com/theforeman/smart-proxy/pull/544
https://github.com/theforeman/foreman-packaging/pull/1814
Updated by Lukas Zapletal about 7 years ago
- Project changed from SELinux to Smart Proxy
- Category set to Packaging
I misread the PR, folks are not against SIGHUP, they did not like "reload" systemd action. Moving to smart-proxy and creating different proposal.
Updated by The Foreman Bot about 7 years ago
- Status changed from New to Ready For Testing
- Pull request https://github.com/theforeman/smart-proxy/pull/556 added
Updated by Lukas Zapletal about 7 years ago
- Project changed from Smart Proxy to SELinux
- Category deleted (
Packaging) - Status changed from Ready For Testing to New
- Pull request deleted (
https://github.com/theforeman/smart-proxy/pull/556)
Ok, sorry for the confusion. I've confirmed with SELinux RH team that this is bug in RHEL policy and it will be fixed, I filed a bug: https://bugzilla.redhat.com/show_bug.cgi?id=1527522
@Jason can you please try this workaround, compile a custom policy with this rule:
allow logrotate_t systemd_unit_file_t:service stop;
That should fix it. I am unable to reproduce on all my RHEL 7.4 systems, but I will prepare a PR that will add this rule into our policy until this is fixed in RHEL 7.6+.
Updated by The Foreman Bot about 7 years ago
- Status changed from New to Ready For Testing
- Pull request https://github.com/theforeman/foreman-selinux/pull/75 added
Updated by Radosław Piliszek about 7 years ago
@Lukas:
I believe this is a too broad change. It would be better to introduce a new type for foreman-proxy service.
7.4 is affected.
Updated by Lukas Zapletal about 7 years ago
I believe this is a too broad change. It would be better to introduce a new type for foreman-proxy service.
RH SELinux team confirmed me they are adding this rule into base policy in RHEL 7.5 so you will eventually get it. It must be stop because that's how SELinux hooks are implemented in systemd, a bit clunky, yeah.
Updated by Radosław Piliszek about 7 years ago
I see. Thanks for clarifying that. My hope is just that it will be implemented in such a way that it will not allow evil logrotate script to stop e.g. firewalld which would open all ports.
Updated by Lukas Zapletal about 7 years ago
Agree, this should be properly solved on the systemd/selinux level and not in all individual applications :-)
Updated by Radosław Piliszek almost 7 years ago
Not exactly. My point is that it would be better if each application decided for itself whether it wants (needs) to be stopped/killed by logrotate via systemd.
Anyway:
There is already a rule:
allow logrotate_t domain : process signal ;
which essentially allows logrotate to send signals other than SIGKILL, SIGSTOP or SIGCHLD to virtually any other process (since all domain types should have domain attribute). So, well, adding systemd stopping would not be that bad (in fact only adding SIGKILL, SIGSTOP and SIGCHLD signals for systemd-managed services). :-)
Updated by Anonymous over 5 years ago
- Status changed from Ready For Testing to Closed
Applied in changeset f9d17e7d6f1ac1066b924f89490da12a400a4860.