System Log Showing Processes Creating Swap Usage

This is probably due to insufficient knowledge of Linux in general, and isn’t specifically about Rocky Linux, but I’ll ask anyway.

I have a new Rocky 9.1 system that is about to become our main mail server (using iRedMail). I’m running it in for a couples of weeks to be sure it’s secure and running as expected.

There is something happening daily at 5AM exactly and also something (different?) at 5:10AM that is causing spikes in memory and swap usage (which jumps from 0 to 60% in a few seconds). It only lasts about 20 minutes, the system stays up and nothing dies, but I’d like to understand it before moving a production load onto it.

My problem is that I’ve examined all the cron tables and every log I can think of, and I just can’t find what’s causing it. I have three different system monitor suites availiable: the hosting provider’s graphs, the Cockpit metrics (with detailed history enabled), and the (amazingly detailed) Netdata cloud metrics. They all show the spike in memory and swap usage, and I get emailed alerts, but none show which app or system process is executing at 5AM.

Similarly, none of the logs I look at show anything special at all at that time. Although iRedmail sets up a dozen or more cron jobs, they all get run much earlier in the night (11PM thru 3AM or so).

I have vague suspicions about clamav rolling over their scan rules. It works by setting up a whole new instance of the scanner, validating the rules, and only then switching live scanning from the old instance to the new, so there are two whole scanners running during the switchover. But my understanding is that this happens irregularly, not once a day at a fixed time.

I am almost tempted to get up at 5AM and watch top tomorrow, but there must be a better way. Is there a log or log option I can enable to see what starts at 5AM?

There might be a better way to do this, rather than getting up at 5AM to watch top, why don’t you create a cron that runs top in batch mode? You can do something like this:

top -b -n 1 > top.txt

the -b is for batch mode and the -n is for the number of iterations you want to run.

You could run it every 5 minutes starting at 5 AM to say 6 AM:

*/5 5-6 * * * /usr/bin/top -b -n 1 >> /home/usr/top.txt

Using the >> will concatenate the top runs into a single file, so you will have to take a look down through the results. You can then review the file when you get up and around. Each run of top will have the header, so it will be easy to see each individual one:

Tasks: 450 total,   1 running, 449 sleeping,   0 stopped,   0 zombie
%Cpu(s):  8.7 us,  1.3 sy,  0.0 ni, 89.9 id,  0.2 wa,  0.0 hi,  0.0 si,  0.0 st
MiB Mem :  31948.2 total,   2276.0 free,   7787.8 used,  21884.4 buff/cache
MiB Swap:   2048.0 total,   1849.2 free,    198.8 used.  22397.8 avail Mem 

Someone else may have a better, less old-school way of finding out what’s going on.

Thanks,
Steve

cron is relict of the past. This days systemd is used…

systemctl list-timers

For finding out what is using swap go through /proc filesystem (you will need to write a script for that)

cat /proc/<pid>/status

Or the easy way:

sudo dnf install smem
smem

Okay, both the above suggestions are informative and helpful. Thank you!

systemctl list-timers
does indeed show me several previously unknown scheduled tasks, but still nothing near 5AM. But good to know.

In the end I’ve settled on setting something up that will run

smem -u >> swapuse.log
every minute from 4:58 thru 5:20. That summarizes swap use per user and, as iRedMail runs everything under different user accounts, should give me what I need.

Thanks again, Paul

For the benefit of anyone finding this in a search later, the approach works well. Here’s the tiny shell script I used:

# sleep until 4:58AM (9:58UTC) tomorrow
sleep $((($(date -f - +%s- <<<09:58$' tomorrow\nnow')0)%86400))

# append time, smem output, and a separator to log (every 10 secs for 25 mins)
for i in {1..150}; do
  date >> swapuse.log
  smem -uw >> swapuse.log
  echo "--------------------------------------------------" >> swapuse.log
  sleep 9
done

Result:

Wed Mar  8 09:58:00 AM UTC 2023
User     Count     Swap      USS      PSS      RSS
chrony       1        0     1008     1079     3580
dbus         2        0     1980     2143     7356
dovecot      2        0     2188     2534    10684
clamupdate     1      604     3088     4293    13344
pcp          3      532     5572     6123    18600
postfix      4        0     4924     7456    35272
polkitd      1        0     7828     8620    16144
paul         4      364     4108     8623    26992
vmail        5        0     7836     9931    40012
mlmmj        6     2196     3172    15699    96220
nginx        7     6532     7004    18253   106048
iredadmin     6     1660     3264    20333   120856
iredapd      1     3816    27884    28052    30532
mysql        1    41728    89004    89349    95108
netdata      5        0   186296   188293   196892
root        34    26876   242112   268358   444704
amavis       6   106948  1425376  1553728  1993992
--------------------------------------------------
Wed Mar  8 09:58:10 AM UTC 2023
User     Count     Swap      USS      PSS      RSS
chrony       1        0     1008     1076     3580
dbus         2        0     1980     2139     7356

… and so on.

However, proving the perverseness of the universe, the unexpected swap spike occurred at 03:55 this morning (whereas it had been reliably at 5:00 every day for the last week) and thus there is nothing of interest in the numbers. So I will have to start logging this more continuously and putting the lot into Excel to get a better handle on it.

One final note: it’s now obvious that monitoring the swap usage by user is unhelpful because of course what goes into swap is not the thing that’s causing memory pressure. Instead everything inactive gets swapped out. But fortunately smem shows data on which users are not currently swapped out too.