Proxmox Network Config Gotchas

Share
Proxmox Network Config Gotchas

A misconfigured /etc/network/interfaces on a remote Proxmox host means console-only recovery. Most of these gotchas don't surface during normal operation — they surface during dist-upgrade, after a reboot, or the first time a VM tries to migrate. Here's the list, ordered roughly by how often each one bites.

1. The dist-upgrade conffile prompt

When you upgrade across a major PVE version (7 → 8 → 9), apt eventually asks:

Configuration file '/etc/network/interfaces'
==> Modified (by you or by a script) since installation.
==> Package distributor has shipped an updated version.
What would you like to do about it?
   Y or I  : install the package maintainer's version
   N or O  : keep your currently-installed version

⚠️ Answer N (keep local). The default if you mash Enter is the maintainer's version — which on a configured PVE host means losing your bridge, your bond, and your routes. On a remote machine you're now offline and on the way to the data center.

To avoid the question entirely:

DEBIAN_FRONTEND=noninteractive \
  apt-get -o Dpkg::Options::="--force-confold" \
  -o Dpkg::Options::="--force-confdef" \
  dist-upgrade -y

--force-confold keeps your local version on conflict. --force-confdef falls back to the package default only when there's no local change to preserve.

2. Commented-out routes that "always worked"

On older PVE installs (4.x, 5.x era) static routes were often added as up route add ... lines. When you migrate to a fresh PVE 8 or 9 install, ifupdown2 is stricter about syntax. A comment that classic ifupdown ignored, or a malformed up directive that classic ifupdown silently tolerated, can fail under ifupdown2.

The symptom: VMs can't reach external networks after a reboot, but the host itself looks fine.

Rewrite legacy routes properly:

auto vmbr0
iface vmbr0 inet static
    address 192.0.2.10/24
    gateway 192.0.2.1
    bridge-ports eno1
    bridge-stp off
    bridge-fd 0
    post-up   ip route add 10.0.0.0/8 via 192.0.2.254 || true
    pre-down  ip route del 10.0.0.0/8 via 192.0.2.254 || true

⚠️ Always include || true on post-up/pre-down route commands, or symmetrically pair them. ifupdown2 will refuse to bring up an interface if a post-up exits non-zero, and route del on a route that doesn't exist returns non-zero.

3. auto vs allow-hotplug

auto vmbr0
iface vmbr0 inet static
    ...

auto brings up the interface at boot, before most services start. This is what you want for management bridges.

allow-hotplug only brings the interface up when the kernel detects the underlying physical NIC. On a server with stable, always-present NICs the difference is invisible. On hosts where NICs are slow to enumerate (some virtualized PVE-on-PVE setups, certain NIC firmwares, hot-pluggable PCIe), allow-hotplug can result in services starting before networking is ready.

Use auto for management and storage bridges. Use allow-hotplug only for genuinely hot-pluggable interfaces.

4. bridge-vlan-aware and the missing bridge-vids

VLAN-aware bridges are the cleanest way to give VMs access to multiple VLANs without one bridge per VLAN:

auto vmbr0
iface vmbr0 inet static
    address 192.0.2.10/24
    gateway 192.0.2.1
    bridge-ports eno1
    bridge-stp off
    bridge-fd 0
    bridge-vlan-aware yes
    bridge-vids 2-4094

⚠️ Without bridge-vids, the bridge is VLAN-aware but accepts no tagged traffic for VMs. The VM's tag= setting in its hardware tab gets silently ignored. Symptom: VM is on the right bridge, has a tagged interface configured in PVE, gets zero traffic.

5. Bond miimon doesn't catch L2-broken switches

Default bond config in PVE templates uses miimon=100. That's link-state-based — it only detects a failure when the NIC's PHY drops link. If your upstream switch is broken at L2 but the link stays up (an unhappy port-channel, a confused MLAG peer, a switch that's forwarding to a black hole), miimon won't notice.

For LACP environments:

auto bond0
iface bond0 inet manual
    bond-slaves eno1 eno2
    bond-mode 802.3ad
    bond-miimon 100
    bond-lacp-rate fast
    bond-xmit-hash-policy layer3+4

bond-lacp-rate fast (LACPDUs every second instead of every 30) catches upstream brokenness much faster.

For active-backup over uncooperative switches, layer ARP monitoring on top:

    bond-mode active-backup
    bond-arp-interval 1000
    bond-arp-ip-target 192.0.2.1

⚠️ ARP monitoring and miimon together can fight each other in some bond modes. Read the kernel's bonding.txt for your specific mode before mixing them.

6. Reloading network on a remote host without locking yourself out

ifreload -a       # ifupdown2: applies the diff between current and configured state

This is the safe command. It computes the difference between running config and /etc/network/interfaces, then applies only what changed.

⚠️ Never run systemctl restart networking on a remote PVE host without a watchdog reboot scheduled. It tears everything down before bringing it back up. If your new config has a typo, you're locked out.

The standard remote-net-change ritual:

# 1. Schedule a safety reboot — gets you out of any lockout
shutdown -r +10 "Net config change safety reboot"

# 2. Apply the change
ifreload -a

# 3. Verify management network, VM networks, storage networks
ping -c 3 192.0.2.1
qm list
pvesm status

# 4. If everything works, cancel the reboot
shutdown -c

If you're on a VM-hosted PVE (PVE-in-PVE labs, KVM-on-bare-metal), some hypervisor consoles support sending a reset out-of-band. Use that instead of shutdown -r if the safety reboot itself would interrupt VMs running on the host you're configuring.

7. MTU mismatches between bridge and ports

A bridge effectively inherits the MTU of its smallest port. If you want jumbo frames, set MTU explicitly on both the bridge and every port:

auto eno1
iface eno1 inet manual
    mtu 9000

auto vmbr0
iface vmbr0 inet static
    address 192.0.2.10/24
    bridge-ports eno1
    bridge-stp off
    bridge-fd 0
    mtu 9000

⚠️ Make sure every port supports the MTU and the switch port is configured to match. A VM with mtu 9000 on a bridge that silently fell back to 1500 will see partial connectivity — small packets work, large ones get dropped, and the symptom looks like "TCP works for SSH but file transfers stall." Verify end-to-end:

ip -d link show vmbr0
ip -d link show <vm-tap-interface>

The mtu field should match top-to-bottom: physical NIC, bond, bridge, VM tap, guest interface.

8. Cloud-init networking overrides

When you build a VM template with cloud-init enabled, cloud-init writes its own network config inside the guest at first boot — overwriting whatever you set up in the template image.

⚠️ Setting a static IP inside the VM image and then enabling cloud-init means cloud-init blanks your config and applies its own DHCP-by-default settings on next boot. Either:

Or commit fully to cloud-init and configure networking via PVE:

qm set <vmid> --ipconfig0 ip=192.0.2.20/24,gw=192.0.2.1

Disable cloud-init network management inside the image:

echo 'network: {config: disabled}' > /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg

Don't do both. They will fight, and the loser is whoever ran last.

9. The missing source /etc/network/interfaces.d/* line

Newer PVE installs include this line at the bottom of /etc/network/interfaces:

source /etc/network/interfaces.d/*

If you upgraded from an old install where this line is missing, any drop-in configs you place in interfaces.d/ are silently ignored. After a major upgrade:

grep -q '^source /etc/network/interfaces.d' /etc/network/interfaces \
  || echo 'source /etc/network/interfaces.d/*' >> /etc/network/interfaces

PVE's SDN feature also writes config to /etc/network/interfaces.d/sdn — without that source line, your SDN configuration won't apply.

10. The forgotten vmbr on a non-clustered upgrade

When upgrading a single PVE host, the network config usually survives intact. When upgrading a clustered node, the upgrade may try to apply the cluster's default config rather than the local one if /etc/pve/corosync.conf is regenerated mid-upgrade. Same conffile rule applies — keep local versions on every prompt, no exceptions.

Pre-flight checklist for any net config change on a PVE host

  1. shutdown -r +10 — safety reboot
  2. cp /etc/network/interfaces{,.bak.$(date +%F-%H%M)} — back up
  3. Edit the config
  4. ifreload -a — apply
  5. Verify management, VM, and storage networks independently
  6. shutdown -c — cancel the safety reboot

Skip any of these and the next outage you cause is your own.