Linux

PVE Cluster Quorum Recovery: A Field Manual

PVE Cluster Quorum Recovery

When a Proxmox VE cluster loses quorum, /etc/pve flips to read-only. You can't start, stop, or migrate VMs. You can't edit configs. The cluster filesystem is a quorum gate, and without majority you don't get to write to it.

Most quorum-loss incidents look the same: one or more nodes are unreachable, corosync is unhappy, and the survivor sits there with pvecm status reporting Quorate: No. The recovery path depends on whether the missing nodes are coming back, whether you can still form a majority, and whether the underlying problem is the network or the nodes themselves.

This is the workflow I run when it happens.

Confirm quorum is actually lost

pvecm status

Key fields:

Quorate: Yes/No
Total votes vs Expected votes
Highest expected (the cluster-configured baseline)

If Total votes < Expected votes / 2 + 1, you're not quorate.

Cross-check membership:

corosync-quorumtool -s
corosync-cfgtool -s

Each ring should show Active. If it shows FAULTY, that's a network problem, not a node problem — fix the link before anything else.

Logs:

journalctl -u corosync -u pve-cluster -n 200 --no-pager

Look for link: 0 is down, Sync members, or New configuration with N nodes.

Scenario 1: One node down in a healthy 3+ node cluster

The cluster is still quorate. No recovery action needed on the survivors — they keep running. When the dead node comes back, corosync re-syncs automatically.

If pmxcfs is out of sync after the node reboots:

systemctl restart corosync pve-cluster

If that doesn't catch up, force a clean restart of just pmxcfs:

systemctl stop pve-cluster
systemctl start pve-cluster

⚠️ Never run pmxcfs -l (local mode) on more than one node simultaneously while reconnecting. They'll diverge and you'll spend longer reconciling than you saved. Local mode is for inspection only, never for sustained operation.

Scenario 2: Lost majority, single survivor

Two of three nodes are gone, you're on the last one, and you need critical VMs running now. Force the survivor to consider itself quorate:

pvecm expected 1

/etc/pve becomes writable, VMs can start, you can edit configs.

⚠️ This is a temporary survival measure, not a fix. The moment the dead nodes come back online with the original expected_votes, you have a split-brain risk. As soon as the cluster is healthy again:

pvecm expected 3   # or whatever your real cluster size is

⚠️ Never run pvecm expected 1 simultaneously on two separated nodes during a network partition. Both will consider themselves authoritative, both will accept writes, and you'll have two divergent versions of /etc/pve to merge by hand. Pick one side to be authoritative and shut corosync down on the other until the network is fixed.

Scenario 3: Network partition (split brain)

Two halves of the cluster can each see themselves but not each other. Whichever side has majority retains quorum. The minority side goes read-only.

Recovery:

Identify the partition by running corosync-cfgtool -s on each side. Compare the ring addresses each side is bound to.
Fix the underlying network issue — usually a switch, a firewall, or an MTU mismatch on the corosync ring.
After re-merge, run pvecm status on every node and confirm they all agree on Quorate: Yes and the same node list.

Watch corosync re-form the membership:

journalctl -u corosync -f

⚠️ Use a dedicated network for corosync if you can. A separate VLAN over a 1Gbit cross-connect is enough — corosync is latency-sensitive and a noisy backup job on the management network is enough to cause spurious partitions. The two-ring knet setup PVE 7+ supports is even better.

Scenario 4: Permanently dead node

If a node is gone for good (hardware loss, decommissioning), remove it from the cluster cleanly. From a surviving, quorate node:

pvecm delnode <nodename>

Clean up its leftover state on the other nodes:

rm -rf /etc/pve/nodes/<nodename>

Update expected_votes in /etc/pve/corosync.conf if needed:

quorum {
  provider: corosync_votequorum
  expected_votes: 2    # was 3
}

⚠️ Bump the config_version field at the top of corosync.conf whenever you edit it, or corosync won't pick up the change:

totem {
  ...
  config_version: 7    # was 6
}

Then reload:

systemctl reload corosync

⚠️ If the dead node ever comes back online with its old config, it will think it's still part of the cluster and corosync will reject it noisily. Reinstall it from scratch before reusing the hardware.

Scenario 5: Rebuild from corrupted corosync state

When /etc/corosync/corosync.conf is out of sync between nodes, or authkey differences cause Authentication failed errors:

Restart:

systemctl start corosync pve-cluster

Copy the correct config and authkey from the authoritative node:

scp authoritative:/etc/corosync/corosync.conf /etc/corosync/
scp authoritative:/etc/corosync/authkey       /etc/corosync/

On a misbehaving node, stop services:

systemctl stop pve-cluster corosync

On the authoritative node (latest config, most recent state), confirm the source of truth:

cat /etc/pve/corosync.conf

/etc/pve/corosync.conf is the cluster-wide source of truth; /etc/corosync/corosync.conf is the local cache.

⚠️ /etc/pve/priv/authkey.key and /etc/corosync/authkey are different files with different purposes. Wiping /etc/pve/priv/authkey.key breaks the web UI and node-to-node API operations. Wiping /etc/corosync/authkey breaks corosync membership. Don't touch either casually — and never assume they're interchangeable.

The dist-upgrade trap

Upgrading PVE major versions (7 → 8, 8 → 9) triggers conffile prompts:

Configuration file '/etc/network/interfaces'
==> Modified (by you or by a script) since installation.
==> Package distributor has shipped an updated version.
What would you like to do about it?

⚠️ Always answer "keep the local version" for /etc/network/interfaces, /etc/corosync/corosync.conf, and /etc/hosts. Replacing any of these with the package default on a clustered node mid-upgrade is the fastest way to lose quorum and lock yourself out of a remote host simultaneously.

To avoid the question entirely:

DEBIAN_FRONTEND=noninteractive \
  apt-get -o Dpkg::Options::="--force-confold" \
  -o Dpkg::Options::="--force-confdef" \
  dist-upgrade -y

The `pvecm add` trap

Re-adding a previously-removed node to a cluster overwrites several local files on the joining node, including /etc/pve/storage.cfg. If the joining node had local-only storage definitions (a directory storage that doesn't exist on other nodes, an NFS mount unique to that host), they vanish.

Before re-adding:

cp /etc/pve/storage.cfg /root/storage.cfg.bak

After pvecm add:

diff /etc/pve/storage.cfg /root/storage.cfg.bak

Manually merge missing entries via the web UI — storage edits propagate to all nodes via pmxcfs.

Quick reference

Symptom	Action
`Quorate: No`, majority gone	`pvecm expected N` where N = surviving votes
Single node, need VMs running NOW	`pvecm expected 1` (temporary, then fix)
Node permanently lost	`pvecm delnode <name>` from a quorate node
Network partition	Fix the link, then watch `journalctl -u corosync`
`Authentication failed` in corosync	Sync `/etc/corosync/authkey` from authoritative
`/etc/pve` is read-only	You're not quorate. `pvecm status` to diagnose
Web UI broken after authkey work	You wiped `/etc/pve/priv/authkey.key`. Restore it.

Quorum loss is rarely the actual problem — it's a symptom of a network failure, a node failure, or a botched upgrade. Fix the underlying cause first, then restore quorum. Forcing pvecm expected 1 to silence the alarm without understanding why it fired is how you turn a recoverable incident into a split-brain mess that takes a day to untangle.

PVE Cluster Quorum Recovery: A Field Manual

Confirm quorum is actually lost

Scenario 1: One node down in a healthy 3+ node cluster

Scenario 2: Lost majority, single survivor

Scenario 3: Network partition (split brain)

Scenario 4: Permanently dead node

Scenario 5: Rebuild from corrupted corosync state

The dist-upgrade trap

The `pvecm add` trap

Quick reference

Read more

What the CISA GitHub Leak Teaches Every Team About Secrets

Building Tamper-Resistant Logging on Linux

Catching a Linux Compromise Early: Behavioral Detection and auditd

WordPress 7.0.2: Unauthenticated RCE in the REST API, Patch Now

Confirm quorum is actually lost

Scenario 1: One node down in a healthy 3+ node cluster

Scenario 2: Lost majority, single survivor

Scenario 3: Network partition (split brain)

Scenario 4: Permanently dead node

Scenario 5: Rebuild from corrupted corosync state

The dist-upgrade trap

The pvecm add trap

Quick reference

Read more

What the CISA GitHub Leak Teaches Every Team About Secrets

Building Tamper-Resistant Logging on Linux

Catching a Linux Compromise Early: Behavioral Detection and auditd

WordPress 7.0.2: Unauthenticated RCE in the REST API, Patch Now

The `pvecm add` trap