Tuning ZFS ARC on my Proxmox box
My main Proxmox node has 64 GB of RAM, a small ZFS mirror for boot and configs, and a larger ZFS pool of spinning rust for bulk storage. VMs live on local NVMe via LVM-thin. Straightforward enough. But I kept getting warnings from my monitoring that “VM memory ballooning is active” on hosts that should have had plenty of headroom.
The symptom
Free memory as seen from the host:
free -h
# total used free shared buff/cache available
# Mem: 62Gi 56Gi 1.1Gi 312Mi 5.2Gi 4.8Gi
# Swap: 8.0Gi 341Mi 7.7Gi
Looks cramped. But used in Linux free is a liar’s number when ZFS is in the picture, because ARC is counted as used, not as cache. The tool that tells the truth is arc_summary:
arc_summary | head -30
# ARC size (current): 98.1 % 39.7 GiB
# Target size (adaptive): 100.0 % 40.5 GiB
# Min size (hard limit): 6.2 % 2.5 GiB
# Max size (high water): 100.0 % 40.5 GiB
40 GB of ARC. That is two thirds of the machine’s memory, on a host that runs 6 VMs. It was not actually starving anyone, but it was close enough that pressure stalls showed up under VM memory spikes, and ARC reclamation is not instant.
Why ZFS does this
ZFS ARC defaults to half of system memory minus 1 GB. On 64 GB that’s 31 GB max. So why was I at 40? Two reasons:
- I had set
zfs_arc_maxexplicitly in/etc/modprobe.d/zfs.confyears ago to 40 GB because I wanted fast scrubs. Forgot about it. - ZFS ARC does not shrink aggressively when other processes request memory. It shrinks, but on a slower timescale than a spiky VM start.
The tuning
My goal was:
- Give the ARC enough room to be useful for hot datasets.
- Keep at least 20 GB guaranteed for VMs plus host overhead.
- Make ARC give back memory faster when pressure exists.
Three knobs in /etc/modprobe.d/zfs.conf:
options zfs zfs_arc_max=17179869184
options zfs zfs_arc_min=4294967296
options zfs zfs_arc_sys_free=4294967296
That’s 16 GB max, 4 GB min, and keep 4 GB of system free headroom. The zfs_arc_sys_free one is the important and less-known knob: it tells ZFS to start evicting ARC when free system memory drops below this threshold. Default is zero, which means ZFS only reacts to direct reclaim pressure.
I applied:
echo 17179869184 > /sys/module/zfs/parameters/zfs_arc_max
echo 4294967296 > /sys/module/zfs/parameters/zfs_arc_sys_free
The runtime files don’t persist across reboot but take effect immediately. The modprobe config makes it permanent. You want both.
Cold cache bite
Right after shrinking the ARC I saw arc_summary drop to 16 GB quickly, but I also saw a small performance dip. My grafana panels for iostat showed the backing disks working harder for an hour or two until the hot set re-warmed into the smaller ARC. That is expected. arcstat 1 was instructive during that window:
arcstat 1 60
# time read miss miss% dmis dm% pmis pm% mmis mm% arcsz c
# 13:44 4237 2451 58 2411 58 40 40 0 0 16G 16G
58% miss rate for a minute, then back to single digits once the hot data fit. If I had a production load I would have phased the resize in more gently by shrinking by 4 GB at a time, waiting for miss rate to settle.
The lesson I keep re-learning
ZFS is happy to use all available memory. This is fine on a ZFS-dedicated storage box. It is less fine on a hypervisor where that memory has other customers. The fix is not “turn off the cache”; it is “tell ZFS your constraints and it will cooperate”.
I also set up a simple arc-pressure alert:
awk '/pressure/ { print $1, $3 }' /proc/pressure/memory
# some 178742991
# full 45112887
If full climbs more than a few percent over a 10-minute window on this host, I look at ARC first.
Other knobs I did not touch
A lot of online advice says to set zfs_arc_meta_limit_percent or play with ZFS prefetch. I did not. Defaults on recent OpenZFS are sane, and I do not have the benchmarks to claim I can do better. The only reason I touched the max was because I had an explicit wrong value from years ago.
Also, I tried ZFS compression and L2ARC a long time ago. Compression is free; leave it on. L2ARC on a homelab is almost never worth the flash wear.
Reflection
Tuning ZFS on Proxmox is a specific skill. Most of the online guides assume storage appliances where ARC can eat the world. On a hypervisor you want ZFS to know its place. Three lines in modprobe.d fixed months of vague intermittent VM memory complaints for me.
Related: see my post on sizing a UPS for a quiet homelab rack for the other “I was overpaying for headroom I did not need” homelab lesson.