## Summary
Adds **`--multiplex-peer`**, a non-standard relay mode that replaces the
per-allocation peer-side port bind with **one shared IPv4+IPv6 UDP
socket pair per relay thread**. Sessions are demultiplexed by exact peer
IP:port in a per-thread `mp_table`. This lifts the ~16 k allocation cap
that the default 49152-65535 relay port range imposes, and dramatically
reduces kernel-level UDP receive-buffer drops under high pps.
Design and trade-offs: [docs/multiplex-peer.md](docs/multiplex-peer.md).
## What changes
### Server (turnserver)
- **`--multiplex-peer`** (cross-platform) — enable the shared per-thread
relay sockets. Replaces the per-session port bind. Implies sendmmsg
batching on Linux and default-enables `--udp-recvmmsg` (override with
`--udp-recvmmsg=0`). Incompatible with EVEN-PORT — those Allocates are
rejected with 400.
- **`--multiplex-peer-port <port>`** (cross-platform, default 3480) —
base port; thread `i` binds `<base>+2i` (IPv4) and `<base>+2i+1` (IPv6).
A 4-thread server consumes 8 ports.
- **`--udp-gso`** (Linux-only CLI) — UDP-GSO (`UDP_SEGMENT` cmsg) on the
relay send path. Requires `--multiplex-peer` (which is what enables the
sendmmsg batching GSO piggybacks on); passing `--udp-gso` alone is a
silent no-op.
- **CLI surface tightened**: `--udp-recvmmsg`, `--udp-recvmmsg-log`,
`--udp-gso` and their fields are now `#if defined(__linux__)` — absent
from `--help`, rejected with `unrecognized option`, and the code paths
compile out on macOS/Windows.
- **Windows portability**: `SO_REUSEPORT` in `mp_open_socket` wrapped in
`#ifdef` (MSVC's Winsock doesn't define it; REUSEPORT was defensive
anyway because the per-thread port layout is unique by construction).
- **`--sock-buf-size` honoured at startup**: the shared multiplex-peer
relay socket now calls `set_ioa_socket_buf_size` in `mp_open_socket` so
the configured rcvbuf is in effect from the moment the socket exists,
not deferred to the first Allocate.
### turnutils_uclient (loadgen)
- **`--no-even-port`** — force `ep = -1` on Allocate. The default path
randomly attaches EVEN-PORT (with no-R bit) even under `-c`, which
`--multiplex-peer` strictly rejects with 400; this flag makes
alloc-flood runs against multiplex-peer deterministic.
- **Legacy `timer_handler` now wraps the per-tick send batch with
`uclient_send_batch_begin/_end`** — without this, runs with
`--sender-threads 0` (the default for `-m < 4`) silently fell through
every send to plain `send(2)`. strace A/B: 205 k `sendto` → 61 k
`sendmsg` (GSO) + 4 k `sendmmsg` + small `sendto` residual for control.
## Measured impact (3-droplet DigitalOcean, c-4 / 4 vCPU, 8 concurrent
UDP streams, 45 s)
| | baseline | `--udp-recvmmsg` | `--multiplex-peer` | `--multiplex-peer
--udp-gso` |
|---|---:|---:|---:|---:|
| Server NIC rx pps (UDP relay both legs) | 350 k | 334 k | 326 k | 294
k |
| Server `UdpInDatagrams` pps | 279 k | 292 k | 300 k | 294 k |
| **Server `UdpRcvbufErrors` pps** | **71 k** | 42 k | 26 k | **0.3 k
(−99.6 %)** |
| **`turnserver` process CPU** | **387 %** | 205 % | 283 % | **133 %
(−65 %)** |
| Server host idle | 22 % | 49 % | 41 % | **68 %** |
Same loadgen-side packet rate (~2 M pps reported by uclient `send_pps`
after the legacy-path batching fix). Iteration log:
[docs/PerformanceIterationLog.md](docs/PerformanceIterationLog.md).
## Test plan
- [x] `ctest --test-dir build` — 3/3 pass (test_ioaddr, test_stun_msg,
test_http_server) on macOS + Linux.
- [x] `examples/run_tests.sh` — 4 protocols + 4 threaded + load-gen
smoke on Linux; 4 protocols on macOS.
- [x] `examples/run_tests_conf.sh` — same coverage, conf-driven.
- [x] `examples/run_tests_multiplex_peer.sh` — UDP/TCP/TLS/DTLS via
`--multiplex-peer --multiplex-peer-port=35000` on macOS + Linux.
- [x] Flag matrix smoke on macOS: `--multiplex-peer`,
`--multiplex-peer-port=42000`, `--multiplex-peer --udp-gso` (no-op),
`uclient --no-even-port`, `uclient --listener-threads N --sender-threads
M` — all pass; `--udp-recvmmsg` / `--udp-gso` correctly rejected with
`unrecognized option`.
- [x] Flag matrix smoke on Linux (Docker): same + `--udp-recvmmsg`
accepted, `--multiplex-peer` auto-enables `--udp-recvmmsg`,
`--udp-recvmmsg=0` overrides the auto-enable.
- [x] Windows compile fix verified — `SO_REUSEPORT` no longer referenced
unconditionally.
- [x] 3-droplet perf matrix completed; per-hop UDP counters captured.
## Docs updated
- New: [docs/multiplex-peer.md](docs/multiplex-peer.md)
- [README.turnserver](README.turnserver): full entries for
`--multiplex-peer`, `--multiplex-peer-port`, `--udp-gso`; clarified
`--udp-recvmmsg` auto-enable semantics.
- [README.turnutils](README.turnutils): added `--no-even-port`, plus
previously-undocumented `--listener-threads` / `--sender-threads`
loadgen pool flags.
- [examples/etc/turnserver.conf](examples/etc/turnserver.conf):
commented `udp-recvmmsg`, `udp-recvmmsg-log`, `udp-gso`,
`multiplex-peer`, `multiplex-peer-port` keys with one-paragraph
descriptions and pointer to `docs/multiplex-peer.md`.
- Man pages regenerated via `./make-man.sh`.
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>