NFS mounts hang forever

The “performance enhancing” TuneD profiles do drop swappiness to 10 (among other things) as shown by grep -r vm.swap /usr/lib/tuned and grep -r include /usr/lib/tuned
See Chapter 1. Getting started with TuneD | Red Hat Product Documentation

The slab_cache part reported by munin is just Slab reported in /proc/meminfo. Slab is used to cache different things in the kernel. It will contain stuff like dentry- and inode- caches that avoid having to retrieve this metadata from disk on every access. These usually correspond to directories and files being opened on the server (including indirectly, e.g. via NFS client access). Slab has reclaimable and non-reclaimable parts. Reclaimable parts can be evicted from cache and the reclaimed memory repurposed for another allocation.

The cache part is page cache: most disk IO is buffered and is done through the page cache:

  • on read, data is read from disk and stored in this cache
  • on write, data is written to page cache and swapped to disk opportunistically, or via e.g. sync/fsync

These pages stay around if subsequent access to same pages is required (e.g. an application frequently reading the same files can mostly avoid disk IO).

AFAIK, on a very large and busy file server, seeing high slab_cache and cache entries is relatively normal.

What is problematic is:

  • If cache has low hit rate, it’s causing page thrashing where the memory reclamation task in kernel spends a lot of time evicting these single-use pages to make room for new ones. AFAIK, this is where swapiness adjustment can help to do this reclaim early, rather than late when you need it. File-based full backups that do not use direct I/O exacerbate this (because they touch files that are mostly untouched and will not need them after they are backed up).
  • if large portions of memory are taken by unreclaimable parts (this was reported in the related forum post wrt AMD EPYC that I mentioned). You can observe this via grep ^SUnreclaim /proc/meminfo.

As iwalker mentioned, 6.1+ kernels have improved in this area (via MGLRU). But I doubt any of that gets backported to the RHEL-adjacent kernel version that Rocky builds from.

2 Likes