Running a Proxmox cluster of two (don't)

I had two NUCs. I wanted HA-ish VM management. I clustered them in Proxmox with two nodes. If you have looked at Proxmox documentation, you know the answer is “don’t”. I did anyway. Here is what happens when you do, and why I caved and added a third.

The quorum problem

Proxmox’s cluster uses corosync, which needs a majority to make decisions. With two nodes, a majority is two. So if either node goes down, neither can take actions: no VM starts, no migrations, no UI changes. The one that is up is there but hobbled.

You can fake it with pvecm expected 1:

pvecm expected 1
# Set expected votes to 1

This tells the cluster that a single vote is enough. Great, except it is a manual action. If you reboot one node at 2 AM and the other is hung for any reason, there is no automation here. You have to ssh in and run the command, which is the opposite of what you wanted from a cluster.

The qdevice hack

The Proxmox docs mention a quorum device (qdevice) as a workaround for small clusters. You add a third vote from a non-Proxmox machine, usually a Raspberry Pi. The Pi runs corosync-qnetd; the Proxmox nodes run corosync-qdevice.

I set this up. It worked… sort of. The quorum was stable. But I learned some things:

The qdevice does not hold any state. It just votes yes. If the Pi dies, you are back to two-node quorum problems, which is arguably worse than if you had planned for two-node from the start.
The qdevice has to run a specific corosync version that matches the cluster. When I upgraded Proxmox to a new major version, I had to update the Pi too.
Firewall rules between the Pi and the cluster have to be exactly right, or the qdevice gets marked “not heard from” intermittently.

I ran this setup for about nine months. It worked 95% of the time.

The fun failure modes

Three specific incidents:

Incident 1: split brain-ish

One node had an intermittent network glitch. The qdevice and the other node agreed on reality. But the glitching node kept rejoining and re-leaving the cluster every few minutes. Each rejoin triggered configuration sync, and at one point the rejoining node tried to take over a VM that was running on the healthy node. VM ran twice briefly. Storage corruption followed. Restore from backup took 40 minutes.

Lesson: corosync’s multicast is finicky. Unicast is more reliable at this scale. I switched to unicast transport but the damage was done.

Incident 2: the Pi SD card

The Pi running the qdevice was running from an SD card. SD card died after six months of heavy journald writes. Cluster lost the qdevice’s vote. Each node still had one vote, so quorum dropped. I could not start a VM until I set expected=1 manually.

Lesson: give the qdevice reliable storage, not an SD card. I moved to a USB SSD.

Incident 3: version skew after upgrade

I upgraded to Proxmox 8.x. The qdevice on the Pi was still on the older corosync version. The cluster complained about version mismatch but kept running. Then at some point it stopped accepting qdevice votes entirely. Took me two hours to realize the upgrade path was the cause.

Lesson: treat the qdevice host as a first-class member of your Proxmox config management. If it is not managed the same way as your nodes, it will drift.

Adding a third node

Eventually I bought a third NUC. The cluster now has three real nodes. The qdevice is gone. Quorum is 2 out of 3 votes, which means any single failure is a non-event. HA works as designed. Live migration works. pvecm status is boring, which is what you want.

pvecm status
# Cluster information
# -------------------
# Name:             home
# Config Version:   12
# Transport:        knet
# Secure auth:      on
# Quorum information
# ------------------
# Date:             Sun Feb  9 16:12:34 2025
# Quorum provider:  corosync_votequorum
# Nodes:            3
# Node ID:          0x00000001
# Ring ID:          1.a6
# Quorate:          Yes
# ...

Three nodes with local storage running Proxmox is the minimum for “a real homelab cluster”. Anything less is a science project.

If you can only afford two nodes

If three nodes is not in the budget, I would run them as two standalone hosts and not cluster at all. You can still do cold migrations manually by shutting down a VM, scp’ing the disk image, and registering it on the other host. It is slower but avoids the class of quorum bugs entirely.

Or run Proxmox + backup-restore, with each host independent and a replication script sending daily snapshots to the other. If one host dies, you spin up the replicas on the other. Recovery time is minutes instead of seconds, but nothing can silently go wrong in the middle of the night.

Why I clustered in the first place

Mostly for live migration. I wanted to apply host updates without taking VMs down. This works well at three nodes. At two nodes with the qdevice it also worked most of the time, but I spent enough weekend debugging time that I regret it.

Reflection

Proxmox as a standalone hypervisor is excellent. Proxmox as a two-node cluster is a trap that the documentation warns you about and that I blundered into anyway. Go three nodes if you want cluster features. Go one node if you do not. Do not go two.

Related: see my post on tuning ZFS ARC on my Proxmox box for the other Proxmox-at-home lesson I wish I had known earlier.