My main Proxmox node has 64 GB of RAM, a small ZFS mirror for boot and configs, and a larger ZFS pool of spinning rust for bulk storage. VMs live on local NVMe via LVM-thin. Straightforward enough. But I kept getting warnings from my monitoring that “VM memory ballooning is active” on hosts that should have had plenty of headroom.

The symptom

Free memory as seen from the host:

free -h
#                total        used        free      shared  buff/cache   available
# Mem:            62Gi        56Gi       1.1Gi       312Mi       5.2Gi       4.8Gi
# Swap:           8.0Gi       341Mi       7.7Gi

Looks cramped. But used in Linux free is a liar’s number when ZFS is in the picture, because ARC is counted as used, not as cache. The tool that tells the truth is arc_summary:

arc_summary | head -30
# ARC size (current):                                    98.1 %   39.7 GiB
# Target size (adaptive):                               100.0 %   40.5 GiB
# Min size (hard limit):                                  6.2 %    2.5 GiB
# Max size (high water):                                100.0 %   40.5 GiB

40 GB of ARC. That is two thirds of the machine’s memory, on a host that runs 6 VMs. It was not actually starving anyone, but it was close enough that pressure stalls showed up under VM memory spikes, and ARC reclamation is not instant.

Why ZFS does this

ZFS ARC defaults to half of system memory minus 1 GB. On 64 GB that’s 31 GB max. So why was I at 40? Two reasons:

  1. I had set zfs_arc_max explicitly in /etc/modprobe.d/zfs.conf years ago to 40 GB because I wanted fast scrubs. Forgot about it.
  2. ZFS ARC does not shrink aggressively when other processes request memory. It shrinks, but on a slower timescale than a spiky VM start.

The tuning

My goal was:

  • Give the ARC enough room to be useful for hot datasets.
  • Keep at least 20 GB guaranteed for VMs plus host overhead.
  • Make ARC give back memory faster when pressure exists.

Three knobs in /etc/modprobe.d/zfs.conf:

options zfs zfs_arc_max=17179869184
options zfs zfs_arc_min=4294967296
options zfs zfs_arc_sys_free=4294967296

That’s 16 GB max, 4 GB min, and keep 4 GB of system free headroom. The zfs_arc_sys_free one is the important and less-known knob: it tells ZFS to start evicting ARC when free system memory drops below this threshold. Default is zero, which means ZFS only reacts to direct reclaim pressure.

I applied:

echo 17179869184 > /sys/module/zfs/parameters/zfs_arc_max
echo 4294967296  > /sys/module/zfs/parameters/zfs_arc_sys_free

The runtime files don’t persist across reboot but take effect immediately. The modprobe config makes it permanent. You want both.

Cold cache bite

Right after shrinking the ARC I saw arc_summary drop to 16 GB quickly, but I also saw a small performance dip. My grafana panels for iostat showed the backing disks working harder for an hour or two until the hot set re-warmed into the smaller ARC. That is expected. arcstat 1 was instructive during that window:

arcstat 1 60
# time   read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  arcsz  c
# 13:44  4237   2451  58    2411  58   40    40    0    0   16G   16G

58% miss rate for a minute, then back to single digits once the hot data fit. If I had a production load I would have phased the resize in more gently by shrinking by 4 GB at a time, waiting for miss rate to settle.

The lesson I keep re-learning

ZFS is happy to use all available memory. This is fine on a ZFS-dedicated storage box. It is less fine on a hypervisor where that memory has other customers. The fix is not “turn off the cache”; it is “tell ZFS your constraints and it will cooperate”.

I also set up a simple arc-pressure alert:

awk '/pressure/ { print $1, $3 }' /proc/pressure/memory
# some 178742991
# full 45112887

If full climbs more than a few percent over a 10-minute window on this host, I look at ARC first.

Other knobs I did not touch

A lot of online advice says to set zfs_arc_meta_limit_percent or play with ZFS prefetch. I did not. Defaults on recent OpenZFS are sane, and I do not have the benchmarks to claim I can do better. The only reason I touched the max was because I had an explicit wrong value from years ago.

Also, I tried ZFS compression and L2ARC a long time ago. Compression is free; leave it on. L2ARC on a homelab is almost never worth the flash wear.

Reflection

Tuning ZFS on Proxmox is a specific skill. Most of the online guides assume storage appliances where ARC can eat the world. On a hypervisor you want ZFS to know its place. Three lines in modprobe.d fixed months of vague intermittent VM memory complaints for me.

Related: see my post on sizing a UPS for a quiet homelab rack for the other “I was overpaying for headroom I did not need” homelab lesson.