This is probably due to insufficient knowledge of Linux in general, and isn’t specifically about Rocky Linux, but I’ll ask anyway.
I have a new Rocky 9.1 system that is about to become our main mail server (using iRedMail). I’m running it in for a couples of weeks to be sure it’s secure and running as expected.
There is something happening daily at 5AM exactly and also something (different?) at 5:10AM that is causing spikes in memory and swap usage (which jumps from 0 to 60% in a few seconds). It only lasts about 20 minutes, the system stays up and nothing dies, but I’d like to understand it before moving a production load onto it.
My problem is that I’ve examined all the cron tables and every log I can think of, and I just can’t find what’s causing it. I have three different system monitor suites availiable: the hosting provider’s graphs, the Cockpit metrics (with detailed history enabled), and the (amazingly detailed) Netdata cloud metrics. They all show the spike in memory and swap usage, and I get emailed alerts, but none show which app or system process is executing at 5AM.
Similarly, none of the logs I look at show anything special at all at that time. Although iRedmail sets up a dozen or more cron jobs, they all get run much earlier in the night (11PM thru 3AM or so).
I have vague suspicions about clamav rolling over their scan rules. It works by setting up a whole new instance of the scanner, validating the rules, and only then switching live scanning from the old instance to the new, so there are two whole scanners running during the switchover. But my understanding is that this happens irregularly, not once a day at a fixed time.
I am almost tempted to get up at 5AM and watch top tomorrow, but there must be a better way. Is there a log or log option I can enable to see what starts at 5AM?
There might be a better way to do this, rather than getting up at 5AM to watch top, why don’t you create a cron
that runs top in batch mode? You can do something like this:
top -b -n 1 > top.txt
the -b
is for batch mode and the -n
is for the number of iterations you want to run.
You could run it every 5 minutes starting at 5 AM to say 6 AM:
*/5 5-6 * * * /usr/bin/top -b -n 1 >> /home/usr/top.txt
Using the >>
will concatenate the top runs into a single file, so you will have to take a look down through the results. You can then review the file when you get up and around. Each run of top will have the header, so it will be easy to see each individual one:
Tasks: 450 total, 1 running, 449 sleeping, 0 stopped, 0 zombie
%Cpu(s): 8.7 us, 1.3 sy, 0.0 ni, 89.9 id, 0.2 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 31948.2 total, 2276.0 free, 7787.8 used, 21884.4 buff/cache
MiB Swap: 2048.0 total, 1849.2 free, 198.8 used. 22397.8 avail Mem
Someone else may have a better, less old-school way of finding out what’s going on.
Thanks,
Steve
cron is relict of the past. This days systemd is used…
systemctl list-timers
For finding out what is using swap go through /proc filesystem (you will need to write a script for that)
cat /proc/<pid>/status
Or the easy way:
sudo dnf install smem
smem
Okay, both the above suggestions are informative and helpful. Thank you!
systemctl list-timers
does indeed show me several previously unknown scheduled tasks, but still nothing near 5AM. But good to know.
In the end I’ve settled on setting something up that will run
smem -u >> swapuse.log
every minute from 4:58 thru 5:20. That summarizes swap use per user and, as iRedMail runs everything under different user accounts, should give me what I need.
Thanks again, Paul
For the benefit of anyone finding this in a search later, the approach works well. Here’s the tiny shell script I used:
# sleep until 4:58AM (9:58UTC) tomorrow
sleep $((($(date -f - +%s- <<<09:58$' tomorrow\nnow')0)%86400))
# append time, smem output, and a separator to log (every 10 secs for 25 mins)
for i in {1..150}; do
date >> swapuse.log
smem -uw >> swapuse.log
echo "--------------------------------------------------" >> swapuse.log
sleep 9
done
Result:
Wed Mar 8 09:58:00 AM UTC 2023
User Count Swap USS PSS RSS
chrony 1 0 1008 1079 3580
dbus 2 0 1980 2143 7356
dovecot 2 0 2188 2534 10684
clamupdate 1 604 3088 4293 13344
pcp 3 532 5572 6123 18600
postfix 4 0 4924 7456 35272
polkitd 1 0 7828 8620 16144
paul 4 364 4108 8623 26992
vmail 5 0 7836 9931 40012
mlmmj 6 2196 3172 15699 96220
nginx 7 6532 7004 18253 106048
iredadmin 6 1660 3264 20333 120856
iredapd 1 3816 27884 28052 30532
mysql 1 41728 89004 89349 95108
netdata 5 0 186296 188293 196892
root 34 26876 242112 268358 444704
amavis 6 106948 1425376 1553728 1993992
--------------------------------------------------
Wed Mar 8 09:58:10 AM UTC 2023
User Count Swap USS PSS RSS
chrony 1 0 1008 1076 3580
dbus 2 0 1980 2139 7356
… and so on.
However, proving the perverseness of the universe, the unexpected swap spike occurred at 03:55 this morning (whereas it had been reliably at 5:00 every day for the last week) and thus there is nothing of interest in the numbers. So I will have to start logging this more continuously and putting the lot into Excel to get a better handle on it.
One final note: it’s now obvious that monitoring the swap usage by user is unhelpful because of course what goes into swap is not the thing that’s causing memory pressure. Instead everything inactive gets swapped out. But fortunately smem shows data on which users are not currently swapped out too.