Possible memory leak in `dnf makecache` (Rocky 9.4 / dnf 4.14.0-9 / libdnf 0.69.0-8)

mauno · September 19, 2024, 7:40am

Hi everyone,

for the past two months or so I’ve been seeing some of my Rocky Linux 9.4 systems sometimes crash, freeze, or processes getting OOM-killed during the night.

The logs reveal that dnf makecache startet seconds before the crash (triggered via systemd dnf-makecache.timer) but never reach the point where the completion message Metadata cache created. is logged.
/var/log/messages

Sep 19 06:33:02 hostname systemd[1]: Starting dnf makecache...
Sep 19 06:33:02 hostname dnf[62376]: Amazon Corretto                                  41 kB/s | 2.9 kB     00:00
Sep 19 06:33:02 hostname dnf[62376]: Rocky Linux 9 - Base                            1.4 MB/s | 4.1 kB     00:00
Sep 19 06:33:02 hostname dnf[62376]: Rocky Linux 9 - Base                             44 MB/s | 2.3 MB     00:00
Sep 19 06:33:02 hostname dnf[62376]: Rocky Linux 9 - AppStream                       1.4 MB/s | 4.5 kB     00:00
Sep 19 06:33:03 hostname dnf[62376]: Rocky Linux 9 - AppStream                        33 MB/s | 8.0 MB     00:00
Sep 19 06:33:04 hostname dnf[62376]: Rocky Linux 9 - Extras                          1.0 MB/s | 2.9 kB     00:00
Sep 19 06:33:04 hostname dnf[62376]: Rocky Linux 9 - HighAvailability                1.7 MB/s | 4.0 kB     00:00
Sep 19 06:33:04 hostname dnf[62376]: Extra Packages for Enterprise Linux 9 - x86_64  1.4 MB/s | 4.3 kB     00:00
Sep 19 06:33:04 hostname dnf[62376]: Extra Packages for Enterprise Linux 9 - x86_64   52 MB/s |  23 MB     00:00

/var/log/dnf.log

2024-09-19T06:33:04+0200 DEBUG reviving: failed for 'localmirror-epel', mismatched repomd.
2024-09-19T06:33:04+0200 DEBUG repo: downloading from remote: localmirror-epel

So far I know that

This only affects Rocky Linux 9.4 systems
All affected systems do not have excessive free Memory ressources (maybe 500-800MB) but were running fine the months and years prior (some since weeks after the initial release of Rocky 9.0)

This leads me to the conclusion that

either there is a memory leak during the dnf makecache operation
Something has significantly changed about the way dnf makecache operates

Have some of you observed similar issues in the past months?
I’ve also found RL9 dnf OOMs with EPEL but only on one machine! - #6 by iwalker which looks like it could the same issue.

Update: I’ve checked the dnf.rpm.log. Judging by date this could relate to the following dnf updates from May:

2024-05-13T23:17:35+0200 SUBDEBUG Upgrade: libdnf-0.69.0-8.el9.x86_64
2024-05-13T23:18:10+0200 SUBDEBUG Upgrade: python3-libdnf-0.69.0-8.el9.x86_64
2024-05-13T23:18:11+0200 SUBDEBUG Upgrade: dnf-data-4.14.0-9.el9.noarch
2024-05-13T23:18:11+0200 SUBDEBUG Upgrade: python3-dnf-4.14.0-9.el9.noarch
2024-05-13T23:18:12+0200 SUBDEBUG Upgrade: dnf-4.14.0-9.el9.noarch
2024-05-13T23:18:12+0200 SUBDEBUG Upgrade: python3-dnf-plugins-core-4.3.0-13.el9.noarch
2024-05-13T23:18:12+0200 SUBDEBUG Upgrade: dnf-plugins-core-4.3.0-13.el9.noarch

sweh · September 19, 2024, 11:51am

It appears to be related to the size of the repo. Dnf-makecache.timer is a major liability is also related and includes a link to Fedora upstream where there’s a lot of discussion.

mauno · September 19, 2024, 3:33pm

Thanks @sweh for providing the link to the related issue & discussion.

500+MB of RAM for an package manager is excessive and would mean that even a minimal headless server would probably require more than 2GB of RAM in order to run stable.

I’ve been working with the “Enterprise Linux”-ecosystem since 2012 and always enjoyed that such stability issues were sorted out beforehand and never (at least I never noticed one) made it into a release.

I’ll look into disabling the dnf-makecache.timer or adding swapfiles for now and follow the discussion to see where it leads…

Docile9713 · November 14, 2024, 9:21am

Linux newbie here.

I’m running Rocky LInux 9.4 on AWS since a few days ago, and it crashed almost every day.
Looking at the messages log, I noticed that even if it runs every 1.5 hours or so, it only crashes when the download size from the EPEL-x86_64 repo is 23 MB.

 dnf[5427]: Extra Packages for Enterprise Linux 9 - x86_64   12 MB/s |  23 MB     00:01

Sometimes when it’s only 17kB or 7.6kB etc, it doesn’t crash.

Looking at the CPU usage record in AWS, when it crashes, it keeps the CPU usage as about 50% on my 2-vCPU system, which if I understand correctly, 1 vCPU was running at 100%, non-stop until I restarted the system.

I’m not sure what information is required for further troubleshooting, but I am happy to provide some other info if needed.

system · January 13, 2025, 9:22am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Dnf-makecache.timer is a major liability Rocky Linux Help & Support rocky-linux-9	6	1043	October 13, 2024
RL9 dnf OOMs with EPEL but only on one machine! Rocky Linux Help & Support rocky-linux-9	7	534	September 7, 2024
Problem with dnf-makecache after 8.6 Rocky Linux Help & Support	2	1584	August 25, 2023
DNF makecache error - Rocky 9 Rocky Linux Help & Support	3	1357	August 25, 2023
Dnf makecache & update Rocky Linux Help & Support rocky-linux-8	4	2512	March 5, 2024

Possible memory leak in `dnf makecache` (Rocky 9.4 / dnf 4.14.0-9 / libdnf 0.69.0-8)

Related topics