coturn

Commit Graph

Author	SHA1	Message	Date
Pavel Punsky	8c7d8fcb86	Enable --udp-recvmmsg by default on Linux (#1930 ) ## Summary Flips the Linux default for `--udp-recvmmsg` from off to on. Operators opt out with `--udp-recvmmsg=false` (or `=0`). > Stacked on #1929. This depends on the recvmmsg-scoping change in #1929 and is based on that branch, so the diff shows only the default-on change. GitHub will auto-retarget the base to `master` once #1929 merges. Merge #1929 first. ## Why this is now safe The original objection to default-on (recorded in `docs/PerformanceIterationLog.md`) was the per-session-relay-socket prealloc tax: `--udp-recvmmsg` applied the 16-buffer batch path to every connected relay socket, which only ever carries one flow, so the churn ate the listener-side win. #1929 scoped recvmmsg to shared fan-in sockets only (`udp_recvmmsg_eligible`: the client listener, plus the per-thread shared relay socket under `--multiplex-peer`). Per-session relay sockets now stay on the single-recv path regardless of the flag, so that tax is gone. The one socket touched by default — the client listener — is a genuine fan-in point: - batches whenever client concurrency is non-trivial (measured `avg_batch ≈ 16` under load), and - costs little when idle (few packets ⇒ few prealloc cycles). ## What changed - `mainrelay.c`: `turn_params.udp_recvmmsg` default `false → true` (Linux only). - Removed the now-dead `--multiplex-peer` auto-enable block and the `udp_recvmmsg_set_explicitly` tracking it relied on; multiplex-peer gets its recvmmsg window from the default. The opt-out flows through the normal `get_bool_value` path. - Help text, `man/man1/turnserver.1`, `examples/etc/turnserver.conf`, `CLAUDE.md`, and `docs/PerformanceIterationLog.md` updated for the new default + opt-out. Per-session relay sockets and DTLS session sockets are unchanged. ## Validation - Format: clang-format 15.0.7 clean. - macOS: build + ctest 6/6 + `run_tests.sh` pass. - Linux (Docker, clean build): ctest 5/5; `run_tests.sh`, `run_tests_conf.sh`, `run_tests_multiplex_peer.sh` all pass (no FAIL). - Runtime proof (loopback, `--udp-recvmmsg-log`): - Default, no flag: recvmmsg active, `calls=13714 packets=219306 avg_batch=15.99`. - `--udp-recvmmsg=false`: zero recvmmsg activity — opt-out confirmed. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	3 weeks ago
Pavel Punsky	5959ecfb13	Add UDP-GSO send path (--udp-gso) (#1907 ) ## Summary - New `--udp-gso` flag (Linux, requires `--udp-sendmmsg`) collapses same-destination, same-size sendmmsg batches into a single `sendmsg` with a `UDP_SEGMENT` cmsg, so the kernel allocates one super-skb that traverses the network stack once and is segmented at egress instead of running `udp_sendmsg → ip_finish_output → __dev_queue_xmit` per datagram. - Also wraps the relay-side `recvmmsg` callback loop in `udp_sendmmsg_batch_begin/end` so peer→client sends triggered inside a recv batch can also coalesce — without that wrapping the relay path issues one `sendto` per delivered datagram. - Sticky-disable on `EINVAL/ENOPROTOOPT` for older kernels/NICs that lack UDP-GSO; one warning logged, then transparent fallback to the existing `sendmmsg` and `udp_send` paths. ## Why The `--udp-recvmmsg` and `--udp-sendmmsg` follow-ups confirmed (see [docs/PerformanceIterationLog.md](docs/PerformanceIterationLog.md)) that on the relay flood workload the dominant cost is the per-datagram kernel TX path. mmsg-style batching reduces only the syscall entry/exit, not the per-skb stack traversal — UDP-GSO collapses both. ## Result DigitalOcean nyc1 c-4, 30 s alternating A/B, `-Y packet -m 1`, eth1 TX as the authoritative server forwarding metric: \| Variant \| eth1 RX \| eth1 TX \| sys CPU \| idle CPU \| \|---\|---:\|---:\|---:\|---:\| \| baseline (no flags) \| 322,091 \| 127,445 \| 22.9 % \| 67.5 % \| \| `--udp-recvmmsg --udp-sendmmsg --udp-gso` \| 266,068 \| 257,996 \| 15.0 % \| 78.7 % \| \| baseline (no flags) \| 309,475 \| 125,573 \| 20.9 % \| 70.7 % \| \| `--udp-recvmmsg --udp-sendmmsg --udp-gso` \| 275,992 \| 225,366 \| 14.9 % \| 74.3 % \| Mean server forwarding rate: 126.5 k → 241.7 k pps (+91 %, 1.91×), mean system CPU 21.9 % → 14.9 % — about 2.8× CPU efficiency (TX pps per system-CPU-%). Full perf-children comparison and methodology in the new section of [docs/PerformanceIterationLog.md](docs/PerformanceIterationLog.md). ## Notes for reviewers - `--udp-gso` is opt-in and requires `--udp-sendmmsg` (the help text states the dependency). Without `--udp-sendmmsg` the batch state never accumulates and GSO has nothing to flush. - GSO eligibility resets on every `_begin/_end`. Mixed-destination, mixed-size, or oversize batches transparently fall back through `sendmmsg` / `udp_send`. - Rebased onto current `master`; the recvmmsg dependency is already merged via #1906. ## Test plan - [x] `cmake --build build --target turnserver` (RelWithDebInfo + ASan local builds clean) - [x] `ctest --test-dir build --output-on-failure` — 3/3 unit tests pass - [x] `examples/run_tests.sh` — TCP/TLS/UDP pass; DTLS pre-existing failure on macOS environment, unrelated to this change - [x] DigitalOcean A/B perf validation captured above - [ ] Reviewer to confirm CI green on Linux build/test/CodeQL --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	1 month ago
Pavel Punsky	a5005c4193	Relay recvmmsg (#1906 ) ## Summary Extends the existing Linux-only `--udp-recvmmsg` flag from the UDP listener socket to also cover connected per-session UDP relay sockets, so steady-state client→relay and peer→relay traffic on plain UDP is read in batches of up to 16 datagrams per `recvmmsg(2)` instead of one `recvmsg` per packet. DTLS sessions still go through the SSL read path and are unchanged. The flag stays opt-in: receive-side batching works correctly, but on the current `m=1` / `m=100` benchmarks throughput is flat to slightly negative — the bottleneck has moved past receive (see results below). ## What's in the change - Shared receive helpers (`src/apps/relay/ns_ioalib_engine_impl.c`, `src/apps/relay/ns_ioalib_impl.h`): - `ioa_parse_udp_recvmsg_cmsg()` — single TTL/TOS/`IP_RECVERR` cmsg parser used by both `udp_recvfrom()` and the new batch path. Replaces the duplicated parser previously inlined in `dtls_listener.c` and `udp_recvfrom()`. - `ioa_init_recvmmsg_hdr()` — single initializer for `mmsghdr`/`iovec`/cmsg/source-address fields, also used by the listener. - New `IOA_UDP_RECVMMSG_MAX_BATCH = 16` constant; both listener and relay paths now share it. - Connected relay batch read (`socket_udp_read_batch_recvmmsg` in `ns_ioalib_engine_impl.c`): called from `socket_input_worker` for non-SSL UDP sockets when `--udp-recvmmsg` is on. Allocates per-message `stun_buffer_list_elem`s, calls `recvmmsg(MSG_DONTWAIT)`, dispatches each datagram through the existing `read_cb` path, and falls back cleanly on `ENOSYS`/`EINVAL`/`EOPNOTSUPP` (auto-disables the flag) and on `EAGAIN`/short-batch (releases unused buffers). - Per-engine scratch state: the `mmsghdr[16]` / `iovec[16]` / cmsg / src-addr arrays live on `ioa_engine`, not on every socket — keeps memory flat at thousands of allocations. - TTL/TOS-sized cmsg buffers in the listener: the listener previously over-allocated `64 KiB` per slot; it now uses the same TTL+TOS sizing as the relay path. - Opt-in occupancy stats behind a new `--udp-recvmmsg-log` flag: every 10 s the relay logs `udp-recvmmsg stats: calls=… packets=… avg_batch=… wouldblock=… unavailable=… no_buffer=… hist_1=… hist_2=… hist_3_4=… hist_5_8=… hist_9_16=…`. Counters are always tracked (cheap); the periodic log is gated by the new flag so default operation is silent. - CLI plumbing: `--udp-recvmmsg-log` long option in `mainrelay.c`/`mainrelay.h`, `cli_print_flag` entry in `turn_admin_server.c`, doc updates in `README.turnserver`. - Docs: `docs/PerformanceIterationLog.md` records the iteration steps, validation, and two rounds of DigitalOcean A/B numbers. `CLAUDE.md` load-test instructions updated to mention the new flag and the `tot_recv_msgs` / `tot_recv_bytes` workaround.	1 month ago
Pavel Punsky	69bc0e7351	Load generator mode in turnutils_uclient (#1894 ) ## Summary Adds load-generator modes to `turnutils_uclient` for repeatable TURN server performance testing: - Adds `-Y packet\|alloc\|invalid` load modes. - Supports packet flood, allocation flood, and invalid-packet flood workflows. - Adds unique local client ports for allocation flood mode. - Removes default packet pacing in load-generator modes unless explicitly set. - Adds helper scripts under `examples/loadtest/`. - Documents load-test usage in `README.turnutils`, `man/man1/turnutils.1`, `CLAUDE.md`, and `docs/PerformanceIterationLog.md`. The performance log captures DigitalOcean benchmark methodology, A/B lessons, hot-path findings, and future optimization candidates.	2 months ago

4 Commits (master)