## Summary
Extends the existing Linux-only `--udp-recvmmsg` flag from the UDP
listener socket to also cover **connected per-session UDP relay
sockets**, so steady-state client→relay and peer→relay traffic on plain
UDP is read in batches of up to 16 datagrams per `recvmmsg(2)` instead
of one `recvmsg` per packet. DTLS sessions still go through the SSL read
path and are unchanged.
The flag stays **opt-in**: receive-side batching works correctly, but on
the current `m=1` / `m=100` benchmarks throughput is flat to slightly
negative — the bottleneck has moved past receive (see results below).
## What's in the change
- **Shared receive helpers** (`src/apps/relay/ns_ioalib_engine_impl.c`,
`src/apps/relay/ns_ioalib_impl.h`):
- `ioa_parse_udp_recvmsg_cmsg()` — single TTL/TOS/`IP_RECVERR` cmsg
parser used by both `udp_recvfrom()` and the new batch path. Replaces
the duplicated parser previously inlined in `dtls_listener.c` and
`udp_recvfrom()`.
- `ioa_init_recvmmsg_hdr()` — single initializer for
`mmsghdr`/`iovec`/cmsg/source-address fields, also used by the listener.
- New `IOA_UDP_RECVMMSG_MAX_BATCH = 16` constant; both listener and
relay paths now share it.
- **Connected relay batch read** (`socket_udp_read_batch_recvmmsg` in
`ns_ioalib_engine_impl.c`): called from `socket_input_worker` for
non-SSL UDP sockets when `--udp-recvmmsg` is on. Allocates per-message
`stun_buffer_list_elem`s, calls `recvmmsg(MSG_DONTWAIT)`, dispatches
each datagram through the existing `read_cb` path, and falls back
cleanly on `ENOSYS`/`EINVAL`/`EOPNOTSUPP` (auto-disables the flag) and
on `EAGAIN`/short-batch (releases unused buffers).
- **Per-engine scratch state**: the `mmsghdr[16]` / `iovec[16]` / cmsg /
src-addr arrays live on `ioa_engine`, not on every socket — keeps memory
flat at thousands of allocations.
- **TTL/TOS-sized cmsg buffers** in the listener: the listener
previously over-allocated `64 KiB` per slot; it now uses the same
TTL+TOS sizing as the relay path.
- **Opt-in occupancy stats** behind a new `--udp-recvmmsg-log` flag:
every 10 s the relay logs `udp-recvmmsg stats: calls=… packets=…
avg_batch=… wouldblock=… unavailable=… no_buffer=… hist_1=… hist_2=…
hist_3_4=… hist_5_8=… hist_9_16=…`. Counters are always tracked (cheap);
the periodic log is gated by the new flag so default operation is
silent.
- **CLI plumbing**: `--udp-recvmmsg-log` long option in
`mainrelay.c`/`mainrelay.h`, `cli_print_flag` entry in
`turn_admin_server.c`, doc updates in `README.turnserver`.
- **Docs**: `docs/PerformanceIterationLog.md` records the iteration
steps, validation, and two rounds of DigitalOcean A/B numbers.
`CLAUDE.md` load-test instructions updated to mention the new flag and
the `tot_recv_msgs` / `tot_recv_bytes` workaround.