MTU, MSS, and a VPN that couldn't stream video
A friend asked me to help with a VPN that “worked fine for everything except video”. I was skeptical. Video is just TCP or UDP like everything else. But sure, I looked. And the saga taught me more about MTU and MSS than I expected.
The situation
My friend ran a WireGuard tunnel from her laptop on hotel wifi back to her home router. Normal web browsing was fine. SSH was fine. Downloading a file off a homelab NAS was a little slow but fine. Video streaming from self-hosted jellyfin would start, show a second of picture, and then freeze. The player would reconnect, freeze, reconnect.
MTU 101 refresher
Every link has a Maximum Transmission Unit: the largest packet you can send on it. Ethernet is typically 1500. Tunnels wrap your packets in another header, so if the underlying link is 1500 and WireGuard adds 60 bytes of header, the effective MTU inside the tunnel is 1440. If you try to send a 1500-byte packet into a 1440-byte tunnel, something has to fragment or drop it.
TCP has a handshake where each side advertises a Maximum Segment Size based on its MTU. If both sides agree on something small enough, packets fit. If either side advertises something too large and Path MTU Discovery fails, TCP packets get dropped silently and connections stall.
The tunnel specifics
The WireGuard tunnel had MTU = 1380 on the laptop side. The home router interface was MTU = 1420. These did not match, but each side’s MTU applied to its own outgoing packets, so it should have been okay. The VPN itself was fine: ping with large packets worked:
ping -M do -s 1300 home.example.net
# 1308 bytes from home.example.net: icmp_seq=1 ttl=62 time=23 ms
ping -M do -s 1400 home.example.net
# ping: local error: message too long, mtu=1380
Good. At 1400 payload (1428 with headers) the laptop got a clear “too big” error. At 1300 it worked.
Why video and not other things
Most HTTP traffic is small request, medium response, and browsers tolerate Path MTU Discovery issues because they fall back to small segments relatively quickly. Video streaming is a lot of big packets pumped continuously, and if there is a PMTU issue the flow stalls for a long time before recovering.
Also, the jellyfin stream was being served over HTTP/2 with TLS. HTTP/2 over a single TCP connection is particularly punishing with PMTU issues because one stalled packet blocks all the logical streams.
The smoking gun
I ran a capture on the laptop, filtered to the TCP flow with jellyfin:
tcpdump -i wg0 -n -s 0 tcp and host 10.77.0.1 and port 8096 -w /tmp/stream.pcap
Opened in wireshark. The laptop advertised MSS of 1340 in the SYN (which is reasonable for a 1380 MTU with TCP+IP headers). The server advertised MSS of 1460 in the SYN-ACK (which it thinks is reasonable because its outgoing interface is a 1500-MTU ethernet). TCP then splits the difference and uses the smaller of the two, which is 1340.
So far so good. But then the hotel’s wifi path out to the internet had something sketchy in the middle. A hop after the hotel gateway had an MTU of 1300. When the laptop sent a 1380-byte wireguard-wrapped packet into the hotel network, the first hop dropped it. ICMP Too Big should come back, but the hotel router was silently dropping ICMP (as many hotel routers do). PMTUD broken.
The fix
Two layers:
Lower the WireGuard MTU on the laptop. I set
MTU = 1280. With a 60-byte WG header plus a 20-byte IP header, that leaves 1200 bytes of payload, which fits comfortably inside the 1300-byte hotel hop.Clamp MSS on the home router. Even though both sides negotiated correctly above, clamping MSS per outgoing interface is a belt-and-suspenders move that protects against similar issues with other peers. On Linux with nftables:
chain forward { type filter hook forward priority 0; tcp flags syn tcp option maxseg size set rt mtu accept }tcp option maxseg size set rt mturewrites the MSS option in TCP SYNs to match the route’s MTU minus 40 bytes for IP+TCP headers. This way, end-to-end, MSS never advertises more than the tunnel can carry.
Verifying
After the changes:
# test with large TCP
iperf3 -c home.example.net -l 1200 -t 10
# 95 Mb/s steady, no stalls
# MSS observed in capture
tshark -i wg0 -Y 'tcp.flags.syn' -T fields -e tcp.options.mss_val
# 1240
Stream played smoothly. Problem was not that the VPN was broken, it was that the path MTU was lower than the tunnel assumed, and PMTUD was not delivering the ICMP.
Why this happens often on hotel and cafe wifi
Captive portals and aggressive firewalls frequently block ICMP Destination Unreachable. The original intent is usually security, although the effect is breaking PMTUD. Large MTU-sensitive flows (VPNs, QUIC) suffer most. Any real network operator knows you must not block type 3 ICMP but captive portal operators are not always real network operators.
Reflection
I came out of this with a stronger belief in two rules:
- Lower the WireGuard MTU on any client that roams. 1280 is a safe bet. Performance is slightly worse but reliability is much better.
- MSS clamp on anything you control. It is not a substitute for correct MTU, but it defends against broken middle boxes.
Related: see my post on a TCP RST that took a week to track down for another case where the tunnel was not the bug, the path was.