Proxmox VE 7.4 β 9.x Cluster Node Upgrade
Table of Contents
- Pre-flight Checks
- Phase 1 - Backup the VM to /backup
- Phase 2 - Session Safety
- Phase 3 - Upgrade 7.4 to 8.x
- Phase 4 - Reboot and Verify on 8.x
- Phase 5 - Upgrade 8.x to 9.x
- Phase 6 - Reboot and Verify on 9.x
- Phase 7 - Post-Upgrade Cleanup
- Rollback Plan
- Gotchas
Pre-flight Checks
# Confirm version baseline
pveversion
# Expected: pve-manager/7.4-x
# Confirm cluster membership and quorum
pvecm status
# Confirm node is healthy
systemctl --failed
journalctl -p err -b --no-pager | tail -50
# Confirm /backup is writable and has space
df -h /backup
mount | grep /backup
touch /backup && rm /backup/.write-test
# List VMs on this node
qm list
β οΈ Do not proceed ifpvecm statusshows the cluster as not quorate, or ifsystemctl --failedshows critical services down.
Phase 1 - Backup the VM to /backup
1.0 Run this from any node
pvecm delnode NODE
1.1 Get the node out of the cluster
systemctl stop pve-cluster && systemctl stop corosync
pmxcfs -l
rm -rf /etc/corosync/authkey /etc/corosync/corosync.conf /etc/pve/corosync.conf /etc/corosync/uidgid.d/
killall pmxcfs
sleep 2
systemctl start pve-cluster
Mask corosync
systemctl stop corosync
systemctl disable corosync
systemctl mask corosync
Confirm VMs still running
qm list
pvecm status # should error with "no cluster info"
Clean leftover state in /etc/pve
for n in $(ls /etc/pve/nodes/ | grep -v "^$(hostname)$"); do
rm -rf /etc/pve/nodes/$n
done
Wipe stale auth state
> /etc/pve/priv/known_hosts
> /etc/pve/priv/authorized_keys
rm -f /etc/pve/priv/authorized_keys.tmp.*
rm -f /etc/pve/priv/known_hosts.[a-zA-Z]*
Trim storage.cfg β keep only this node's local storages
nano /etc/pve/storage.cfg
# Keep: local, <node>, any local-backup
# Drop: every entry pinned to a different node
# Drop the `nodes <name>` constraint from kept entries
1.3 Mount /backup or add the folder
mkdir -p /backup/$(hostname)-snapshots
1.4 Add /backup as a Proxmox storage target
# Add /backup as a directory-type storage, restricted to this node, accepts vzdump backups
pvesm add dir local-backup \
--path /backup \
--content backup,iso,snippets \
--nodes $(hostname) \
--shared 0
# Verify
pvesm status | grep local-backup
1.5 Full vzdump backup to /backup
VMID=$(qm list | awk 'NR>1 {print $1}')
# Snapshot mode = no downtime, zstd = fast compression
for vmid in $(qm list | awk 'NR>1 {print $1}'); do
echo "=== Backing up VMID $vmid ==="
vzdump $vmid \
--mode snapshot \
--compress zstd \
--storage local-backup \
--notes-template "Pre PVE 7.4β9.x upgrade {{node}} {{vmid}}"
done
# Verify backup file exists
ls -lh /backup/dump/vzdump-qemu-*.vma.zst
qm list | awk 'NR>1 {print $1}' | wc -l
ls /backup/dump/vzdump-qemu-*.vma.zst | wc -l
1.6 Backup cluster and node configuration
mkdir -p /backup/host-config/$(hostname)-$(date +%F)
BACKUP_DIR=/backup/host-config/$(hostname)-$(date +%F)
# Cluster-aware config (in /etc/pve, FUSE-mounted from cluster DB)
tar czf $BACKUP_DIR/etc-pve.tgz /etc/pve 2>/dev/null
# Node-local critical configs
tar czf $BACKUP_DIR/etc-node.tgz \
/etc/network/interfaces \
/etc/hosts \
/etc/hostname \
/etc/resolv.conf \
/etc/corosync \
/etc/ssh \
/etc/apt \
/etc/fstab \
/etc/default/grub \
/etc/lvm 2>/dev/null
# Package list (for diffing later if something breaks)
dpkg -l > $BACKUP_DIR/dpkg-list-pre74.txt
apt-mark showhold > $BACKUP_DIR/apt-holds.txt
ls -lh $BACKUP_DIR
1.7 Sanity-check the backup is restorable
# Inspect the vzdump archive header without extracting
ls -lh /backup/dump/
zstd -t /backup/dump/*.zst && echo "Archive integrity OK"
# Confirm Proxmox can see the backup in its UI listing
pvesm list local-backup
Phase 2 - Session Safety
2.1 Open screen
screen
β οΈ Mandatory. A dropped SSH session during dist-upgrade leaves dpkg in a broken state.2.2 Quiesce scheduled jobs
# Stop scheduled backups, replication, etc. for the duration
systemctl stop pvescheduler 2>/dev/null || systemctl stop pve-daily-update.timer
systemctl stop cron
Restart these after Phase 6 verification passes.
Phase 3 - Upgrade 7.4 to 8.x
3.0 Add Proxmox repos if not there
grep -rq "pve-no-subscription" /etc/apt/sources.list.d/ /etc/apt/sources.list 2>/dev/null || {
sed -i 's/^deb/#deb/' /etc/apt/sources.list.d/pve-enterprise.list
cat > /etc/apt/sources.list <<'EOF'
deb http://deb.debian.org/debian bullseye main contrib
deb http://deb.debian.org/debian bullseye-updates main contrib
deb http://security.debian.org/debian-security bullseye-security main contrib
EOF
echo "deb http://download.proxmox.com/debian/pve bullseye pve-no-subscription" \
> /etc/apt/sources.list.d/pve-no-subscription.list
}
3.1 Patch 7.4 to absolute latest
apt update
apt dist-upgrade
# Review prompts, do NOT use -y
3.2 Run the official 7-to-8 checker
pve7to8 --full
β οΈ Do not proceed if any check returns FAIL. Common fixes:Stale corosync config: pull from another working nodeOld GPG keys: re-add Proxmox repo key withsigned-byDeprecated storage configs: edit/etc/pve/storage.cfg
3.3 Swap repositories: Bullseye β Bookworm
# Debian base repos
sed -i 's/bullseye/bookworm/g' /etc/apt/sources.list
# Debian security suite renamed in bookworm
sed -i 's|bullseye-security|bookworm-security|g' /etc/apt/sources.list
sed -i 's|bullseye/updates|bookworm-security|g' /etc/apt/sources.list
# Proxmox no-subscription repo (use enterprise line if you have a sub)
cat > /etc/apt/sources.list.d/pve-install-repo.list <<'EOF'
deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription
EOF
# Remove enterprise stub if no subscription
[ ! -f /etc/apt/sources.list.d/pve-enterprise.list ] || \
rm /etc/apt/sources.list.d/pve-enterprise.list
# Refresh Proxmox release GPG key under bookworm scheme
wget -qO /etc/apt/trusted.gpg.d/proxmox-release-bookworm.gpg \
https://enterprise.proxmox.com/debian/proxmox-release-bookworm.gpg
# Verify key fingerprint
sha512sum /etc/apt/trusted.gpg.d/proxmox-release-bookworm.gpg
# Expected: 7da6fe34168adc6e479327ba517796d4702fa2f8b4f0a9833f5ea6e6b48f6507a6da403a274fe201595edc86a84463d50383d07f64bdde2e3658108db7d6dc87
3.4 Update package index
apt update
β οΈ Fix any "NO_PUBKEY" or signature errors here, before dist-upgrade.3.5 Run the 7β8 dist-upgrade
# It will ask for permission to restart services, you can click yes/no, doesn't matter
apt dist-upgrade
Conf file prompts to expect:
| File | Recommended action |
|---|---|
/etc/ssh/sshd_config |
Keep local (N) β preserve your hardening |
/etc/lvm/lvm.conf |
Keep local (N) |
/etc/default/grub |
Keep local (N) |
/etc/issue, /etc/issue.net |
Take maintainer's |
/etc/corosync/* |
Keep local (N) β never overwrite cluster config |
Anything in /etc/pve/* |
Won't prompt, cluster-managed |
/etc/apt/sources.list.d/pve-enterprise.list |
Take maintainer's |
3.6 Verify userspace upgrade succeeded
pveversion
# Expected: pve-manager/8.x
apt list --upgradable
# Should be empty or near-empty
systemctl --failed
Phase 4 - Reboot and Verify on 8.x
4.1 Pre-reboot snapshot of running services
qm list > /backup/host-config/vms-pre-reboot-8x.txt
pvecm status > /backup/host-config/cluster-pre-reboot-8x.txt
4.2 Shut down the VM cleanly
VMID=$(qm list | awk 'NR>1 {print $1}')
for vmid in $(qm list | awk '$3=="running" {print $1}'); do
echo "Shutting down $vmid..."
qm shutdown $vmid --timeout 180 &
done
# Wait and verify
qm list
# Force-stop any stragglers (rare, but covers stuck guests)
for vmid in $(qm list | awk '$3=="running" {print $1}'); do
echo "Force-stopping stuck VM $vmid..."
qm stop $vmid
done
β οΈ Don'tstop, useshutdownβ sends ACPI signal, lets the guest flush filesystems.
4.3 Reboot
systemctl reboot
4.4 Post-reboot verification on 8.x
# === Pre-flight: check for missing ISO references ===
# PVE 8+ refuses to start VMs with missing media β fix BEFORE bulk start
echo "=== Checking for VMs with missing ISO references ==="
MISSING_FOUND=0
for conf in /etc/pve/qemu-server/*.conf; do
vmid=$(basename $conf .conf)
# Check each ISO reference in this config
grep -E "iso/" $conf | while read line; do
ref=$(echo "$line" | grep -oE "[A-Za-z0-9_-]+:iso/[^,]+")
[ -z "$ref" ] && continue
path=$(pvesm path "$ref" 2>/dev/null)
if [ -z "$path" ] || [ ! -f "$path" ]; then
slot=$(echo "$line" | cut -d: -f1)
echo "β VM $vmid slot $slot references missing: $ref"
echo " Fix: qm set $vmid --$slot none,media=cdrom"
fi
done
done
echo ""
echo "Review and run the suggested 'qm set' commands before starting VMs"
pveversion -v # confirm pve-manager 8.x, kernel 6.2 or 6.5
uname -r # confirm new kernel loaded
systemctl --failed
journalctl -p err -b --no-pager | tail -50
# === Start all VMs that were running pre-reboot if they weren't set to Auto Boot ===
for vmid in $(awk 'NR>1 {print $1}' /backup/host-config/vms-pre-reboot-8x.txt); do
echo "Starting $vmid..."
qm start $vmid
sleep 5 # stagger to avoid I/O storm on shared storage
done
qm list
4.5 Pre-start audit: empty CD-ROMs
β οΈ PVE 9 rejects none,media=cdrom references with host_cdrom requires a file name error.
Find affected VMs and remove the CD-ROM device:
for conf in /etc/pve/qemu-server/*.conf; do
vmid=$(basename $conf .conf)
grep -E "^(ide|sata|scsi)[0-9]+: none,media=cdrom" $conf | while read line; do
slot=$(echo "$line" | cut -d: -f1)
echo "VM $vmid: removing empty CD-ROM at $slot"
qm set $vmid --delete $slot
done
done
Or fix individually via GUI: VM β Hardware β CD/DVD Drive β Remove.
Phase 5 - Upgrade 8.x to 9.x
5.1 Patch 8.x to absolute latest first if still needs an update0
rm -r /etc/apt/sources.list.d/pve-enterprise.list
apt update
apt dist-upgrade
pveversion
# Expected: pve-manager/8.4-x or higher
5.2 Run the 8-to-9 checker
pve8to9 --full
β οΈ Same rule: zero FAIL items before continuing.
5.3 If you get a mixed repo failure and sysctl deprecated config, do the following
sed -i 's/bullseye/bookworm/g' /etc/apt/sources.list.d/pve-no-subscription.list
grep -vE '^\s*#|^\s*$' /etc/sysctl.conf > /etc/sysctl.d/99-local.conf
cat > /etc/sysctl.conf <<'EOF'
# Settings moved to /etc/sysctl.d/99-local.conf
# This file is kept empty per Debian convention.
EOF
sysctl --system
β οΈ Same rule: zero FAIL items before continuing.
5.4 Take a fresh vzdump before the second jump
VMID=$(qm list | awk 'NR>1 {print $1}'); vzdump $VMID --mode snapshot --compress zstd --storage local-backup --notes-template "Pre PVE 8.xβ9.x upgrade {{node}} {{vmid}}"
5.5 Swap repositories: Bookworm β Trixie
# Debian base
sed -i 's/bookworm/trixie/g' /etc/apt/sources.list
# Debian security suite (trixie format)
sed -i 's|bookworm-security|trixie-security|g' /etc/apt/sources.list
# Proxmox repo for PVE 9
cat > /etc/apt/sources.list.d/pve-install-repo.list <<'EOF'
deb http://download.proxmox.com/debian/pve trixie pve-no-subscription
EOF
# If using Ceph, switch its repo too β check current Ceph version first
# ceph -v
# Then update /etc/apt/sources.list.d/ceph.list to the matching trixie line
# Add trixie GPG key
wget -qO /etc/apt/trusted.gpg.d/proxmox-release-trixie.gpg \
https://enterprise.proxmox.com/debian/proxmox-release-trixie.gpg
rm -f /etc/apt/sources.list.d/pve-enterprise.list
rm -f /etc/apt/sources.list.d/pve-enterprise.sources
rm -rf /etc/apt/sources.list.d/pve-no-subscription.list
rm -f /etc/apt/trusted.gpg.d/proxmox-release-bookworm.gpg
β οΈ Verify the trixie key fingerprint against the official Proxmox docs at the time of upgrade β keys rotate.
5.6 Update package index
apt update
5.7 Run the 8β9 dist-upgrade
apt dist-upgrade
Same conf files rules as Phase 3.5. Keep local for anything you've customized.
5.8 Verify userspace upgrade
pveversion
# Expected: pve-manager/9.x
systemctl --failed
Phase 6 - Reboot and Verify on 9.x
6.0 Capture full /etc/pve state
BACKUP_DIR=/backup/host-config/$(hostname)-$(date +%F)
tar czf $BACKUP_DIR/etc-pve.tgz /etc/pve 2>/dev/null
6.1 Shut down VM cleanly
VMID=$(qm list | awk 'NR>1 {print $1}')
for vmid in $(qm list | awk '$3=="running" {print $1}'); do
echo "Shutting down $vmid..."
qm shutdown $vmid --timeout 180 &
done
# Wait and verify
qm list
# Force-stop any stragglers (rare, but covers stuck guests)
for vmid in $(qm list | awk '$3=="running" {print $1}'); do
echo "Force-stopping stuck VM $vmid..."
qm stop $vmid
done
6.2 Reboot
systemctl reboot
# Then take fresh inventory backup
qm list > /root/vms-to-restart.txt
6.3 Join the cluster
# === Step 1: Confirm VMs are stopped (Phase 6.1 + reboot did this) ===
qm list
# All should be "stopped" or "paused" β not "running"
# === Step 2: Backup EVERYTHING that pvecm add will overwrite ===
mkdir -p /root/pve-pre-join-backup
BACKUP=/root/pve-pre-join-backup
# VM and CT configs (the "host already contains virtual guests" blocker)
mkdir -p $BACKUP/qemu-server $BACKUP/lxc
mv /etc/pve/qemu-server/*.conf $BACKUP/qemu-server/ 2>/dev/null
mv /etc/pve/lxc/*.conf $BACKUP/lxc/ 2>/dev/null
# β οΈ storage.cfg WILL be overwritten β back it up
cp /etc/pve/storage.cfg $BACKUP/storage.cfg
# Other things pvecm add may replace (per the dev's warning)
cp -r /etc/pve/firewall $BACKUP/firewall 2>/dev/null
cp /etc/pve/jobs.cfg $BACKUP/jobs.cfg 2>/dev/null
cp /etc/pve/user.cfg $BACKUP/user.cfg 2>/dev/null
cp /etc/pve/datacenter.cfg $BACKUP/datacenter.cfg 2>/dev/null
cp -r /etc/pve/sdn $BACKUP/sdn 2>/dev/null
# Verify pvecm add precondition met
ls /etc/pve/qemu-server/ # MUST be empty
ls /etc/pve/lxc/ # MUST be empty
# === Step 3: Unmask corosync (was masked in Phase 1.1) ===
systemctl unmask corosync
systemctl status corosync # inactive, condition unmet β correct
# === Step 4: Join Cluster ===
THIS_IP=$(ip -4 -br addr show vmbr0 | awk '{print $3}' | cut -d/ -f1)
echo "Joining as IP: $THIS_IP"
pvecm add $CLUSTERIP \
--link0 $THIS_IP \
--use_ssh \
# Wait for cluster sync
sleep 10
pvecm status
pvecm nodes
# === Step 5: Restore VM configs (now into cluster-replicated pmxcfs) ===
mv $BACKUP/qemu-server/*.conf /etc/pve/qemu-server/ 2>/dev/null
mv $BACKUP/lxc/*.conf /etc/pve/lxc/ 2>/dev/null
# Verify replication to CLUSTERIP
sleep 5
ssh root@CLUSTERIP "ls /etc/pve/qemu-server/"
# Should show all your restored VMIDs + the local ones
# === Step 6: Restore storage.cfg entries (MERGE, do not replace) ===
# You need to ADD: this node's local-* and storage entries
# View what Cluster has after join
cat /etc/pve/storage.cfg
# View what you backed up
cat $BACKUP/storage.cfg
# Manually merge β append your node-specific storage entries
nano /etc/pve/storage.cfg
# Add stanzas like:
# dir: STORAGENAME
# path /backup
# content images,rootdir,vztmpl,iso,backup,snippets
# nodes NODENAME
# prune-backups keep-all=1
# shared 0
#
# dir: local-backup
# path /backup
# content backup,iso,snippets
# nodes NODENAME
# shared 0
pvesm status # verify all storages active
# === Step 7: Restore firewall/jobs/etc if you had them ===
# Only restore what's NOT cluster-conflicting. For example:
# - Firewall IPSets/rules: usually merge cleanly
# - jobs.cfg (vzdump schedules): merge node-specific entries only
# - user.cfg: β οΈ DO NOT replace cluster's β it has cluster's users now
diff $BACKUP/jobs.cfg /etc/pve/jobs.cfg 2>/dev/null
# If you had vzdump schedules, manually re-add them to /etc/pve/jobs.cfg
# === Done β proceed to Phase 6.4 to start VMs ===
6.4 Post-reboot verification on 9.x
pveversion -v
uname -r # kernel 7.0+ or whatever 9.x ships
systemctl --failed
journalctl -p err -b --no-pager | tail -100
# === Start all VMs that were running pre-reboot ===
for vmid in $(awk 'NR>1 {print $1}' /root/vms-to-restart.txt); do
qm start $vmid
sleep 5
done
qm list
6.5 Re-enable scheduled jobs
systemctl start pvescheduler 2>/dev/null || systemctl start pve-daily-update.timer
systemctl start cron
Phase 7 - Post-Upgrade Cleanup
7.1 Remove old kernels
# List installed kernels
dpkg -l 'pve-kernel-*' 'proxmox-kernel-*' | awk '/^ii/ {print $2}'
# Use Proxmox's kernel tooling
proxmox-boot-tool kernel list
apt autoremove --purge
7.2 Drop the pre-upgrade snapshot (only after VM is confirmed healthy for several days)
VMID=$(qm list | awk 'NR>1 {print $1}')
qm listsnapshot $VMID
qm delsnapshot $VMID pre_upgrade_YYYYMMDD
7.3 Final state verification
pveversion -v
pve8to9 --full # should report nothing actionable
qm list
df -h /backup
Rollback Plan
Scenario A: Upgrade fails mid-dist-upgrade, VM still running
# Try to recover dpkg first
dpkg --configure -a
apt -f install
apt dist-upgrade
If unrecoverable, shut down the VM, restore from vzdump on a working PVE node.
Scenario B: Reboot fails, node won't come up
# From Proxmox boot menu, select previous kernel
# Or boot from Debian rescue ISO and:
mount /dev/<root> /mnt
chroot /mnt
# Re-run: dpkg --configure -a, apt -f install
Scenario C: Node boots but cluster join is broken
# Restore cluster config from /backup
tar xzf /backup/host-config/<dated>/etc-node.tgz -C /
systemctl restart corosync pve-cluster
pvecm status
Scenario D: VM disk corrupted
# Restore from vzdump
VMID=<id>
qmrestore /backup/dump/vzdump-qemu-${VMID}-*.vma.zst $VMID --force --storage <target>
Scenario E: pvecm add fails or leaves cluster in inconsistent state
# === On the cluster ===
# Forget the failed-to-join node
pvecm delnode <this-node>
# === On the joining node ===
# Restore pre-join state
systemctl stop pve-cluster corosync
pmxcfs -l
rm -f /etc/corosync/* /etc/pve/corosync.conf
killall pmxcfs
sleep 2
systemctl start pve-cluster
# Restore configs from your backup
mv /root/pve-pre-join-backup/qemu-server/*.conf /etc/pve/qemu-server/
mv /root/pve-pre-join-backup/lxc/*.conf /etc/pve/lxc/
cp /root/pve-pre-join-backup/storage.cfg /etc/pve/storage.cfg
# ... etc
# Re-mask corosync
systemctl mask corosync
# Verify VMs visible again
qm list
# Diagnose what went wrong, then retry Phase 6.3
Gotchas
- β οΈ Never skip
pve7to8orpve8to9. They catch incompatible storage configs, stale repos, dead packages, and corosync version mismatches before they bite duringdist-upgrade. - β οΈ Do not run
apt upgradeβ onlyapt dist-upgrade. Plainupgraderefuses to install new dependencies and leaves the node half-upgraded. - β οΈ
apt-keyis removed in Bookworm. Any third-party repo using legacy/etc/apt/trusted.gpgwill warn β migrate to/etc/apt/keyrings/withsigned-by=syntax. - β οΈ Conf file prompts are not auto-answered by
-y. Watch them. - β οΈ Mixed-version cluster is transient. Don't run a cluster on PVE 7 + PVE 9 nodes for weeks β pveproxy GUI will show stale data and live migration between major versions is not supported.
- β οΈ Old LXC templates (CentOS 7, Debian 9, Ubuntu 18.04) often fail to start under PVE 9's kernel 6.8+ and cgroup v2. You said only one VM, no LXC β non-issue here, but worth noting.
- β οΈ Corosync version skew between cluster nodes during a rolling upgrade is tolerated only short-term. Don't leave this node on a different corosync major than the rest of the cluster for more than a maintenance window.
- β οΈ ZFS root + secure boot: if
proxmox-boot-tool refreshcomplains about stale ESPs, fix withproxmox-boot-tool init /dev/<esp-partition>before rebooting.