## Summary
- New `--udp-gso` flag (Linux, requires `--udp-sendmmsg`) collapses
same-destination, same-size sendmmsg batches into a single `sendmsg`
with a `UDP_SEGMENT` cmsg, so the kernel allocates one super-skb that
traverses the network stack once and is segmented at egress instead of
running `udp_sendmsg → ip_finish_output → __dev_queue_xmit` per
datagram.
- Also wraps the relay-side `recvmmsg` callback loop in
`udp_sendmmsg_batch_begin/end` so peer→client sends triggered inside a
recv batch can also coalesce — without that wrapping the relay path
issues one `sendto` per delivered datagram.
- Sticky-disable on `EINVAL/ENOPROTOOPT` for older kernels/NICs that
lack UDP-GSO; one warning logged, then transparent fallback to the
existing `sendmmsg` and `udp_send` paths.
## Why
The `--udp-recvmmsg` and `--udp-sendmmsg` follow-ups confirmed (see
[docs/PerformanceIterationLog.md](docs/PerformanceIterationLog.md)) that
on the relay flood workload the dominant cost is the per-datagram kernel
TX path. mmsg-style batching reduces only the syscall entry/exit, not
the per-skb stack traversal — UDP-GSO collapses both.
## Result
DigitalOcean nyc1 c-4, 30 s alternating A/B, `-Y packet -m 1`, eth1 TX
as the authoritative server forwarding metric:
| Variant | eth1 RX | eth1 TX | sys CPU | idle CPU |
|---|---:|---:|---:|---:|
| baseline (no flags) | 322,091 | 127,445 | 22.9 % | 67.5 % |
| `--udp-recvmmsg --udp-sendmmsg --udp-gso` | 266,068 | **257,996** |
15.0 % | 78.7 % |
| baseline (no flags) | 309,475 | 125,573 | 20.9 % | 70.7 % |
| `--udp-recvmmsg --udp-sendmmsg --udp-gso` | 275,992 | **225,366** |
14.9 % | 74.3 % |
Mean server forwarding rate: **126.5 k → 241.7 k pps (+91 %, 1.91×)**,
mean system CPU **21.9 % → 14.9 %** — about **2.8× CPU efficiency** (TX
pps per system-CPU-%). Full perf-children comparison and methodology in
the new section of
[docs/PerformanceIterationLog.md](docs/PerformanceIterationLog.md).
## Notes for reviewers
- `--udp-gso` is opt-in and requires `--udp-sendmmsg` (the help text
states the dependency). Without `--udp-sendmmsg` the batch state never
accumulates and GSO has nothing to flush.
- GSO eligibility resets on every `_begin/_end`. Mixed-destination,
mixed-size, or oversize batches transparently fall back through
`sendmmsg` / `udp_send`.
- Rebased onto current `master`; the recvmmsg dependency is already
merged via #1906.
## Test plan
- [x] `cmake --build build --target turnserver` (RelWithDebInfo + ASan
local builds clean)
- [x] `ctest --test-dir build --output-on-failure` — 3/3 unit tests pass
- [x] `examples/run_tests.sh` — TCP/TLS/UDP pass; DTLS pre-existing
failure on macOS environment, unrelated to this change
- [x] DigitalOcean A/B perf validation captured above
- [ ] Reviewer to confirm CI green on Linux build/test/CodeQL
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…DP-GSO
The libevent EV_READ handler used to do one recvfrom + one sendto per
ready event, so a packet flood through the relay generated O(N) libevent
re-entries and 2N syscalls per N relayed datagrams — saturating one core
on the loadgen-side peer well below modern relay throughput.
On Linux, replace the handler with:
* a drain loop: keep recvmmsg'ing in MSG_DONTWAIT until the queue
returns less than a full batch, bounded by MAX_DRAIN_ROUNDS so a flood
can't starve the rest of the event loop;
* recvmmsg into a static mmsghdr[32] (peer is single-threaded) and reuse
the same mmsghdr array for sendmmsg back — each entry already has
msg_name pointing at the source (the echo destination) and the iovec
pointing at the received bytes, so no userspace copy;
* UDP-GSO: when the recvmmsg batch is homogeneous (≥2 entries, same
source, same size, ≤1472 B), echo it as one sendmsg with UDP_SEGMENT
cmsg so the kernel allocates one super-skb that traverses the network
stack once.
The non-Linux build keeps the original recvfrom/sendto handler.
DigitalOcean nyc1 c-4 30 s alternating A/B paired with the GSO
turnserver (-Y packet -m 1):
old peer: turn TX mean 228 k pps, peer CPU mean 91.0 % (saturated)
new peer: turn TX mean 255 k pps, peer CPU mean 28.8 %
Peer CPU drops 3.2× while turn-side throughput climbs ~12 % because the
old peer was no longer fully reflecting at the GSO turnserver's rate.
The peer is no longer the loadgen-side bottleneck, freeing CPU for
multi-flow tests.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
## Summary
Extends the existing Linux-only `--udp-recvmmsg` flag from the UDP
listener socket to also cover **connected per-session UDP relay
sockets**, so steady-state client→relay and peer→relay traffic on plain
UDP is read in batches of up to 16 datagrams per `recvmmsg(2)` instead
of one `recvmsg` per packet. DTLS sessions still go through the SSL read
path and are unchanged.
The flag stays **opt-in**: receive-side batching works correctly, but on
the current `m=1` / `m=100` benchmarks throughput is flat to slightly
negative — the bottleneck has moved past receive (see results below).
## What's in the change
- **Shared receive helpers** (`src/apps/relay/ns_ioalib_engine_impl.c`,
`src/apps/relay/ns_ioalib_impl.h`):
- `ioa_parse_udp_recvmsg_cmsg()` — single TTL/TOS/`IP_RECVERR` cmsg
parser used by both `udp_recvfrom()` and the new batch path. Replaces
the duplicated parser previously inlined in `dtls_listener.c` and
`udp_recvfrom()`.
- `ioa_init_recvmmsg_hdr()` — single initializer for
`mmsghdr`/`iovec`/cmsg/source-address fields, also used by the listener.
- New `IOA_UDP_RECVMMSG_MAX_BATCH = 16` constant; both listener and
relay paths now share it.
- **Connected relay batch read** (`socket_udp_read_batch_recvmmsg` in
`ns_ioalib_engine_impl.c`): called from `socket_input_worker` for
non-SSL UDP sockets when `--udp-recvmmsg` is on. Allocates per-message
`stun_buffer_list_elem`s, calls `recvmmsg(MSG_DONTWAIT)`, dispatches
each datagram through the existing `read_cb` path, and falls back
cleanly on `ENOSYS`/`EINVAL`/`EOPNOTSUPP` (auto-disables the flag) and
on `EAGAIN`/short-batch (releases unused buffers).
- **Per-engine scratch state**: the `mmsghdr[16]` / `iovec[16]` / cmsg /
src-addr arrays live on `ioa_engine`, not on every socket — keeps memory
flat at thousands of allocations.
- **TTL/TOS-sized cmsg buffers** in the listener: the listener
previously over-allocated `64 KiB` per slot; it now uses the same
TTL+TOS sizing as the relay path.
- **Opt-in occupancy stats** behind a new `--udp-recvmmsg-log` flag:
every 10 s the relay logs `udp-recvmmsg stats: calls=… packets=…
avg_batch=… wouldblock=… unavailable=… no_buffer=… hist_1=… hist_2=…
hist_3_4=… hist_5_8=… hist_9_16=…`. Counters are always tracked (cheap);
the periodic log is gated by the new flag so default operation is
silent.
- **CLI plumbing**: `--udp-recvmmsg-log` long option in
`mainrelay.c`/`mainrelay.h`, `cli_print_flag` entry in
`turn_admin_server.c`, doc updates in `README.turnserver`.
- **Docs**: `docs/PerformanceIterationLog.md` records the iteration
steps, validation, and two rounds of DigitalOcean A/B numbers.
`CLAUDE.md` load-test instructions updated to mention the new flag and
the `tot_recv_msgs` / `tot_recv_bytes` workaround.
## Summary
`fuzzing/stun.dict` line 147 used C-style `\r\n\r\n` for the HTTP
end-of-headers keyword:
```
kw_http_eoh="\r\n\r\n"
```
libFuzzer's `ParseDictionaryFile` only accepts three escape sequences
inside quoted entries: `\\`, `\"`, and `\xAB` hex. `\r` / `\n` are
unrecognized, so a local fuzz run aborts dictionary load with:
```
ParseDictionaryFile: error in line 147
kw_http_eoh="\r\n\r\n"
```
Replace with the hex form used by the other 111 entries in the file:
```
kw_http_eoh="\x0d\x0a\x0d\x0a"
```
The turnserver man page had drifted from the actual CLI options the
binary accepts. The shipped `man/man1/turnserver.1` was last regenerated
on 05 June 2021, so several options added since then were missing and
one removed option was still documented.
The man page is auto-generated from `README.turnserver` via
`make-man.sh` (txt2man), so the source-of-truth edit is in the README;
the `.1` files are then regenerated.
In `README.turnserver`:
- Add 13 options that exist in `mainrelay.c` long_options[] but were
undocumented: --include-reason-string, --syslog-facility,
--drop-invalid-packets, --drop-invalid-packets-log, --udp-recvmmsg,
--respond-http-unsupported, --prometheus-address, --prometheus-path,
--version, --cpus, --no-cli, --no-rfc5780,
--response-origin-only-with-rfc5780.
- Document --sql-userdb as an alias on the existing --psql-userdb line.
- Remove the stale --ne=[1|2|3] entry (no longer parsed by the binary).
The regenerated `man/man1/turnserver.1` also picks up a backlog of
options that were already in the README but never reached the shipped
page (--software-attribute, --cli, --sock-buf-size, --raw-public-keys,
--stun-backward-compatibility, and the corrected --no-tlsv1_2 wording).
`man/man1/turnadmin.1` and `man/man1/turnutils.1` are regenerated as a
side-effect of `make-man.sh` running over all three READMEs; their
content was similarly stale relative to README.turnadmin /
README.turnutils.
## Summary
- The `--ne=[1|2|3]` option was already removed from `long_options[]`
and the option parser, so `turnserver` rejects it at runtime, but the
help text printed by `turnserver --help` still advertised it.
PR #1517 (Jun 2024) simplified codeql.yml in ways that left scans
incomplete: it dropped the actions:read / contents:read permissions and
the analyze category, both of which CodeQL Action requires for results
to land under the existing language category. Combined with the later
cpp -> c-cpp rename and v3 -> v4 upgrade, scheduled scans have not
refreshed the Security tab since Jun 1, 2024.
- Add actions:read and contents:read back to job permissions
- Set build-mode: manual on init (required for v3+/v4 manual builds)
- Pass category "/language:c-cpp" on analyze so SARIF de-duplicates
against the configured language
- Build with --parallel so the tracer keeps up on default runners
## Summary
- Add a self-contained Fil-C build/test harness under `filc/` that
mirrors the existing `fuzzing/` pattern: one host script
(`filc/run-local.sh`) builds an Ubuntu 24.04 image with the
[Fil-C](https://github.com/pizlonator/fil-c) optfil 0.678 toolchain,
builds turnserver with `CC=filcc`/`CXX=fil++`, runs unit tests + system
tests, and drops a per-run timestamped log directory with `SUMMARY.txt`
+ `ISSUES.txt`.
- Fix the two real Fil-C compatibility bugs the harness surfaces by
changing `ur_map_value_type` and `ur_addr_map_value_type` from
`uintptr_t` to `void *` in `src/server/ns_turn_maps.h`.
## Why
[Fil-C](https://fil-c.org) is a memory-safe C/C++ compiler (Clang 20
fork) that pairs every pointer with an "InvisiCap" capability and turns
UB into deterministic panics with no `unsafe` escape hatch. Putting
coturn through it answers two questions: (a) does it compile unmodified,
and (b) does it run correctly under capability-enforced memory safety.
After this PR, the answer is **yes** for both — turnserver,
turnutils_peer, and turnutils_uclient relay TCP/TLS/UDP/DTLS traffic
with full Fil-C enforcement, all unit tests pass, and
`examples/run_tests_conf.sh` runs end-to-end.
## What's in the PR
### `filc/` harness (commit 1)
| File | Purpose |
|---|---|
| `filc/Dockerfile` | Ubuntu 24.04 + Fil-C optfil 0.678 (extracts the
nested `fil.tar.xz` to `/opt/fil`); `--platform linux/amd64` so it works
on Apple Silicon under emulation. |
| `filc/run-local.sh` | Host-side: build image, create
`filc/logs/<UTC-ts>/`, run container with source mounted read-only and
log dir mounted r/w. |
| `filc/docker-entrypoint.sh` | In-container orchestrator. Phases: env /
source-copy / build / unit-tests / system-cli / system-conf. Runs every
phase even when a prior one fails (no aborting mid-pass). Captures
per-phase logs + a combined `all.log` + JUnit XML for ctest. Greps
panics/errors into `ISSUES.txt`. Downgrades `system-*` phases to FAIL
when `examples/run_tests*.sh` prints `FAIL` despite exiting 0 (existing
fragility in those scripts). |
| `filc/build.sh` | `cmake … -DBUILD_TESTING=ON -DCMAKE_C_COMPILER=filcc
-DCMAKE_CXX_COMPILER=fil++ -DCMAKE_BUILD_TYPE=RelWithDebInfo`, then
build. |
| `filc/.gitignore` | Ignore the on-host `logs/` dir. |
The harness also bumps the post-launch sleep in `examples/run_tests.sh`
from 2s to 6s **only inside the container** (sed-in-place on the copied
source; upstream is untouched). Under linux/amd64 emulation the
Fil-C-built turnserver isn't accepting TCP at 2s, so the first sub-test
races and prints `FAIL`. Matches the 5s sleep already used by
`run_tests_conf.sh`.
### Pointer-typedef fixes (commit 2)
`src/server/ns_turn_maps.h`:
```diff
-typedef uintptr_t ur_map_value_type;
+typedef void *ur_map_value_type;
...
-typedef uintptr_t ur_addr_map_value_type;
+typedef void *ur_addr_map_value_type;
```
**Why this is necessary.** Both maps store pointers, but their value
slot is integer-typed. Every existing `_put` site casts a pointer
through `(ur_*_value_type)` to store, and every `_get` site casts back.
Under standard C this is a well-defined no-op. Under Fil-C, casting a
pointer to `uintptr_t` discards its InvisiCap; casting back yields a
pointer with a non-null address but a NULL Fil-C object — the next
dereference panics with `cannot read pointer with null object`.
The harness caught two such panics, both in the auth-resume /
relay-allocate flow:
1. `src/server/ns_turn_server.c:3248` — `ss->client_socket`, where `ss`
came from `sessions_map` (a `ur_map`).
2. `src/apps/relay/turn_ports.c:225` — `tp->mutex`, where `tp` came from
`ip_to_turnports_*` (a `ur_addr_map`) via `turnipports_add`.
**Why this is also a correctness improvement on a normal build.** The
new typedef makes the API strictly more type-safe — the compiler now
enforces "you put a pointer in." It eliminates a class of accidental
misuse (storing a non-pointer integer where a pointer was expected) that
the integer typedef silently allowed. Same generated code on a normal
build; different (correct) Fil-C semantics.
**Audit.** Verified before changing:
- All `ur_map_put` / `lm_map_put` / `ur_addr_map_put` callers store
pointer-typed values exclusively (no callers store raw integers).
- No internal arithmetic on the value type anywhere in `ns_turn_maps.c`.
- `ur_map_del_func` / `ur_addr_map_func` implementations either don't
exist (all `_del` callers pass `NULL`) or immediately cast their
parameter to a real pointer type — no source change needed.
- `KHASH_MAP_INIT_INT64(3, ur_map_value_type)` works identically with
`void *`.
- `ur_addr_map`'s `addr_elem.value` is assigned, read, compared for
truthiness, and cleared with `= 0` — all valid for `void *`.
## Test plan
- [ ] `filc/run-local.sh` reports all six phases PASS (env / source-copy
/ build / unit-tests / system-cli / system-conf), `ISSUES.txt` carries
no Fil-C panic / safety / sanitizer entries.
- [ ] Local `cmake -S . -B build -DBUILD_TESTING=ON && cmake --build
build && ctest --test-dir build --output-on-failure` is green (no
regression on the regular build).
- [ ] `examples/run_tests.sh` and `examples/run_tests_conf.sh` are green
on Linux per `CLAUDE.md`.
- [ ] Existing `fuzzing/run-local.sh ASan 0 -runs=1` still passes (the
new `filc/` directory is independent and shouldn't perturb anything).
write_to_peerchannel(): get_relay_socket_ss() and
ioa_network_buffer_get_size() were each called twice per channel-data
packet. The compiler can't CSE the calls (cross-TU through a
get_relay_socket() accessor in ns_turn_allocation.c that it can't prove
pure), so cache the relay socket and the inbound size once.
handle_turn_send(): same get_relay_socket_ss() duplication on the
STUN_SEND path.
read_client_connection(): the inbound size was fetched four times
(received_bytes accumulator, verbose log, blen seed, ret check). Reuse
ret as orig_blen.
No behavior change. Targets the ~0.4% per-packet overhead these helpers
were contributing in the m=1 packet-flood profile.
This is a four-instruction accessor (read sa_family, return struct
sockaddr_{in,in6} size) that gets called from every per-packet sendto(),
recvmsg(), and addr-map lookup. Cross-TU it stays a real function call;
moving the body into ns_turn_ioaddr.h as static inline lets each call
site fold the family branch directly into the syscall setup.
perf record on the m=1 packet flood (c-4 nyc1) confirms the win:
- udp_recvfrom self-time: 0.76% -> 0.35% (-54%)
- udp_send self-time: 0.60% -> 0.26% (-57%)
End-to-end throughput stays in the run-to-run noise band, as
expected for a kernel-bound workload, but the released CPU is real.
ioa_socket_check_bandwidth(): hoist the "no bps limit configured"
fast-exit before the multi-condition socket-state check. The vast
majority of sessions have max_bps == 0, so the existing path was running
5+ pointer dereferences and equality tests just to land on the same
return-1.
send_data_from_ioa_socket_nbh(): drop the redundant inner "if (!(s->done
|| s->fd == -1))" gate. The outer if/else-if branch already filtered
those, and ioa_socket_tobeclosed() rechecks both, so the inner test was
dead code on every successful send.
perf record on the c-4 nyc1 droplet (m=1 packet flood, 12s) shows
send_data_from_ioa_socket_nbh self-time drop from 0.91% to 0.54% and
ioa_socket_check_bandwidth fall out of the top-25 user-space symbols
(was 0.33%). Throughput is within run-to-run noise — the relay is
syscall-bound, so user-space wins don't translate 1:1 — but the released
CPU is real.
Same pattern as the get_ioa_addr_len() inline: addr_cpy() is a
single-memcpy helper that fires on every receive (each packet dispatch
copies the source address into ioa_net_data, plus allocation/permission
map-key copies). Cross-TU it stays a real function call.
Combined with the previous four iterations (turn_server_get_engine
hoist, bandwidth fast-exit + dead-check removal, cached relay-socket and
buffer-size lookups, get_ioa_addr_len inline), the alternating A/B run
on the same c-4 nyc1 droplet now shows a consistent +5% throughput on
the m=1 packet flood test (recv_msgs/30s mean over 6 rounds: B=146984 /
I=155468).
turn_report_session_usage() runs on every packet but only does real
reporting work once per 4096 packets. Re-order the early returns so the
bitmask fast-exit fires before the cross-TU
turn_server_get_engine() call, and flatten the nested if-blocks into
guard clauses for readability.
No behavior change. A/B testing on a c-4 nyc1 droplet shows the
single-client packet-flood throughput within noise (alternating B/I
rounds: B=149317 / I=153844 mean recv_msgs over 30s, ~3% in iter1's
favor with ~10% run-to-run variance).
stun_is_challenge_response_str in src/client/ns_turn_msg.c only descends
into its three inner stun_attr_get_first_by_type_str calls when the
input is an error response with err_code 401 or 438 *and* a REALM
attribute *and* a NONCE attribute. The OAuth branch additionally
requires STUN_ATTRIBUTE_THIRD_PARTY_AUTHORIZATION.
The fuzzer-driven path in harness_attr_iter calls the predicate every
iteration but the conjunction of conditions is too specific for
libFuzzer to discover from binary mutation alone — OSS-Fuzz introspector
flags 9 unreached callsites on
stun_attr_get_first_by_type_str gated on this function.
Add harness_challenge_response_builder that constructs six deterministic
message variants on every iteration and runs each through the predicate:
- 401 with REALM + NONCE (canonical success)
- 401 with REALM + NONCE + THIRD-PARTY-AUTH (OAuth branch)
- 438 with REALM + NONCE (438 disjunct)
- 401 with REALM only (NONCE-missing path)
- 401 with no REALM (REALM-missing path)
- 400 with REALM + NONCE (wrong err_code path)
Each variant runs once with a non-NULL oauth pointer and once with NULL
to cover both branches of the optional output. Realm / nonce /
server-name lengths and the transaction id are derived from fuzz bytes
so iterations stay meaningfully distinct.
Verified by stand-alone harness:
- 401+REALM+NONCE returns true with attrs copied out, oauth=false
- 401+REALM+NONCE+TPA returns true with oauth=true and server_name
populated — confirming all three inner get_first_by_type_str callsites
and the OAuth disjunct are now exercised.
map_addr_from_public_to_private and map_addr_from_private_to_public in
src/client/ns_turn_ioaddr.c walk a static public_addrs[] table of size
mcount. Without an explicit ioa_addr_add_mapping call mcount stays 0 for
the entire fuzz process, so the loop body — including the
addr_eq_no_port call it gates — is dead code in every fuzz iteration.
OSS-Fuzz introspector flags this as 19 unreached callsites under
map_addr_from_public_to_private and 4+4 under addr_eq_no_port.
Extend the existing shared LLVMFuzzerInitialize in FuzzOpenSSLInit.c
(linked into both FuzzStun and FuzzStunClient via FUZZ_COMMON_SOURCES)
to register two synthetic public<->private mapping pairs — one v4
(192.0.2.1 <-> 10.0.0.1) and one v6 (2001:db8::1 <-> fd00::1) — once at
startup. The header comment for ioa_addr_add_mapping requires
single-threaded init before fuzzing begins, which matches exactly when
LLVMFuzzerInitialize runs.
Verified by stand-alone harness: after init,
stun_attr_get_first_addr_str on a XOR-MAPPED-ADDRESS attribute holding
192.0.2.1:443 returns 10.0.0.1:443, and the v6 equivalent returns
[fd00::1]:8080 — confirming addr_eq_no_port is now called inside the
loop body in both helpers.
OSS-Fuzz introspector flags three blockers the fuzzer cannot reach on
its own:
1. findstr() in src/client/ns_turn_msg.c is gated by is_http(), which
requires GET/POST/PUT/DELETE prefix + " HTTP/" + "\r\n\r\n". The
fuzzer's binary STUN seeds never synthesize a valid HTTP frame.
2. stun_attr_get_reservation_token_value() and
stun_attr_get_response_port_str() are called from harness_attr_iter only
when the input contains the matching attribute type. Neither appears in
the existing seed corpus.
Add HTTP framing keywords to fuzzing/stun.dict and four new seed files
covering both gaps:
- seed_http_get.raw: minimal "GET / HTTP/1.1\r\nHost: x\r\n\r\n"
- seed_http_post_clen.raw: POST with Content-Length to drive the strtoul
branch in is_http
- seed_reservation_token.raw: STUN allocate response with an 8-byte
RESERVATION-TOKEN attribute
- seed_response_port.raw: STUN binding request with a 4-byte
RESPONSE-PORT attribute
Each new STUN seed validated against the real parsers
(stun_get_message_len_str, stun_attr_get_first_by_type_str, is_http) to
confirm it reaches the targeted branch.
The corpus zips also drop pre-existing __MACOSX/ and .DS_Store entries
that had snuck in during a prior macOS zip step; net file count rises
(24 -> 28 in FuzzStun, 4 -> 8 in FuzzStunClient) while archive size
shrinks because of the junk removal.
Add harness_stun_buffer_api to FuzzStunClient.c that exercises every
public wrapper in src/apps/common/stun_buffer.c not already reached by
the existing harnesses: stun_get_size (NULL/non-NULL), the init_request
/ init_indication / init_success_response builders, the tid accessors,
the stun_is_indication wrapper (which gates the static is_channel_msg),
the attr_add / attr_add_channel_number / attr_add_addr /
attr_add_even_port (both branches) / attr_get_first_by_type accessors,
stun_set_allocate_request (rt NULL and non-NULL paths),
stun_set_binding_request /
stun_prepare_binding_request, and the channel-message wrappers.
Each builder call is followed by inspect_buffer_message so the resulting
serialized message is also walked by the parser predicates. A tail block
also pumps raw fuzzer bytes through the wrapper-form predicates
(stun_is_indication, stun_is_channel_message, stun_tid_from_message,
stun_attr_get_first_by_type) so they see malformed inputs the serializer
paths cannot produce.
## Summary
Introduces an opt-in unit test layer for coturn using
[Unity](https://github.com/ThrowTheSwitch/Unity) — a single-header
pure-C test framework that matches coturn's C11 toolchain, portability
bar, and zero-C++ production tree.
- Unity v2.6.0 is fetched on demand via CMake `FetchContent` (nothing
vendored).
- Tests are gated behind `-DBUILD_TESTING=ON` (off by default), so the
standard build and OSS-Fuzz pipeline are unaffected.
- Two test binaries cover pure C-callable code in `libturnclient`:
- `test_ioaddr` (6 cases) — `make_ioa_addr`,
`addr_get_port`/`addr_set_port`, `addr_eq` variants, `addr_to_string`,
IPv4/IPv6/garbage input
- `test_stun_msg` (7 cases) — STUN header construction,
request/indication/success/error response classification, transaction-ID
round-trip, channel message parsing, truncated/zeroed buffer rejection
- New `check` cmake target builds tests before running ctest (avoids the
`make test` footgun where the auto-generated `test` target only runs
already-built binaries).
- Legacy `Makefile.in` gets a `unit-tests` target that bootstraps
`build/unit-tests/` and delegates to the cmake `check` target. `make
check` and `make test` now run the RFC 5769 conformance suite **plus**
the Unity unit tests.
- CLAUDE.md documents the new workflow plus the one-liner for adding a
new `test_<name>.c`.
## Why
The existing test story is shell-script integration suites under
`examples/scripts/` — they exercise the binary end-to-end but can't pin
down behavior of individual functions, can't run without a full build
environment, and don't fail loudly when a unit-level invariant breaks. A
lightweight unit layer gives us:
- Targeted regression coverage for protocol parsing/encoding (the
highest bug-yield area).
- A natural home for tests of the kinds of subtle invariants already
documented in CLAUDE.md (port-counter overflow safety, port-bounds
inclusivity, HMAC buffer initialization).
- Sub-second feedback for contributors.
## Usage
```bash
# CMake direct
cmake -S . -B build -DBUILD_TESTING=ON
cmake --build build -j --target check # build + run all unit tests
ctest --test-dir build --output-on-failure # run already-built tests
# Legacy Makefile bridge (after ./configure)
make unit-tests # bootstraps build/unit-tests/, builds + runs Unity tests
make check # RFC 5769 conformance + unit tests
```
Adding a new test:
1. Drop `tests/test_<name>.c`
2. Append `coturn_add_test(test_<name>)` in `tests/CMakeLists.txt`
3. The `check` target picks it up automatically.
## Test plan
- [x] Clean cmake build with `-DBUILD_TESTING=ON` succeeds; full source
tree (turnserver, turnadmin, turnclient, turn_server, all turnutils)
still builds
- [x] `cmake --build build --target check` builds and runs both test
binaries — 13/13 cases pass
- [x] `ctest --verbose` shows per-case PASS lines for all 13 cases
- [x] Default build (`-DBUILD_TESTING` unset) does not fetch Unity or
build any test binary
## Notes for reviewers
- Why Unity over GoogleTest/Catch2: pure C, single source file, no C++
toolchain dependency, runs anywhere coturn does (incl. exotic CMake
targets like Solaris/AIX). GoogleTest would force `extern "C"` wrappers
and a C++ compiler everywhere.
## Summary
- Remove the `udp-relay-servers` config knob and the
`udp_relay_servers_number` field — after #1849 left only the
`PER_THREAD` UDP engine, this knob is no longer wired to anything that
creates extra UDP relay threads.
- Delete the now-orphaned `udp_relay_servers[]` array, the
`TURNSERVER_ID_BOUNDARY_BETWEEN_TCP_AND_UDP` / `..._UDP_AND_TCP` macros,
and the `id >= boundary` branch in `get_relay_server()`. The array was
read but never written anywhere in the tree, so the branch was
unreachable dead code.
- Drop a stray unused `char s[257]` in
`dbd_redis.c::redis_list_secrets`.
- Adjust the startup banner to log "Total relay threads" (was "Total
General servers" + a never-fired "Total UDP servers" block).
## Test plan
- [x] `cmake .. && make -j8` clean build
- [x] `examples/scripts/rfc5769.sh` — all RFC 5769 conformance vectors
pass
- [x] `examples/scripts/basic/relay.sh` + `udp_c2c_client.sh` —
12000/12000 msgs, 0 lost, 0 dropped
Upstream OSS-Fuzz build recipe
(google/oss-fuzz/projects/coturn/build.sh) only copies two fuzzer
binaries -- FuzzStun and FuzzStunClient -- and their seed corpora into
$OUT. The eight additional fuzz targets added later never ran on
oss-fuzz.com, which is why the introspector profile reports "fuzzer no
longer available" for them.
Rather than patching the Google-owned build recipe, fold all fuzzers
into the two binaries OSS-Fuzz actually ships. Each target now begins
with a single-byte selector (Data[0] mod 5) that dispatches to one of
five sub-harnesses:
FuzzStun - integrity (SHA1/multi-SHA), attr_iter, attr_add,
old_stun
FuzzStunClient - stun_client, channel_data, addr_codec, oauth_token,
oauth_roundtrip
No upstream OSS-Fuzz changes are required.
## Summary
- Fixes compilation error on Linux when `_GNU_SOURCE` is not defined by
the toolchain: `struct mmsghdr` has incomplete type and `recvmmsg()` is
implicitly declared
- Defines `_GNU_SOURCE` in three places for full coverage across build
systems:
- `dtls_listener.c` — before includes, guarded by `#if
defined(__linux__)`
- `configure` — adds `-D_GNU_SOURCE` to `OSCFLAGS` on Linux for the
legacy build path
- `CMakeLists.txt` — adds `-D_GNU_SOURCE` on Linux for the CMake build
## Context
The `recvmmsg()` batched receive path added in #1852 uses `struct
mmsghdr` and `recvmmsg()`, which are glibc extensions requiring
`_GNU_SOURCE`. Some Linux distros/toolchains don't define this
implicitly, causing:
```
src/apps/relay/dtls_listener.c:129:18: error: array type has incomplete element type 'struct mmsghdr'
src/apps/relay/dtls_listener.c:748:18: warning: implicit declaration of function 'recvmmsg'
```
Fixes#1867
## Test plan
- [x] Verified CMake build succeeds on macOS (recvmmsg code is `#if
defined(__linux__)` guarded — no effect on non-Linux)
- [x] Verify build succeeds on Linux with and without `_GNU_SOURCE` in
the environment
- [x] Verify both `cmake` and `./configure && make` build paths work
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The first ALLOCATE set ss->origin_set=1 before check_stun_auth ran, so
an unauthenticated attacker could lock the session into a realm of their
choice by forging the ORIGIN attribute on the first packet. If per-realm
ACLs differ, this lets the attacker pick the most permissive realm for
that session.
Defer the commit of ss->origin_set until check_stun_auth succeeds with a
valid MESSAGE-INTEGRITY. Until auth passes, every request re-parses
ORIGIN, so the 401 challenge still carries the correct realm derived
from the current ORIGIN attribute.
A bad value like CIDR notation in allowed-peer-ip or denied-peer-ip was
silently dropped: add_ip_list_range returned -1 but the config parser
kept going, leaving the intended whitelist or blocklist partial.
Operators expecting denied-peer-ip=10.0.0.0/8 would end up with no block
at all, enabling SSRF-via-TURN to internal networks.
Fail closed: log the offending value and exit, so the problem is visible
at startup. CIDR parsing is not added (separate feature).
snprintf-then-redisCommand(rc, s) passed attacker-influenced bytes (STUN
USERNAME/REALM, admin CLI inputs) as the printf format string to
hiredis. A `%s`/`%n`/`%x` byte in a REALM attribute would cause stack
misread or a write primitive.
Replace every call site with redisCommand(rc, FORMAT, args) so user
bytes are arguments, never the format string.
memcmp short-circuits on first differing byte, letting an attacker
recover a valid HMAC byte-by-byte via response-time differences. Switch
to CRYPTO_memcmp, which is constant-time regardless of the first
mismatching byte.
## Summary
- Skip allocating a 65 KB response buffer for STUN indications (SEND,
DATA, BINDING indication) in `read_client_connection()` — indications
never produce a response, so the buffer was immediately freed
- Guard the unknown-attributes error-response block in
`handle_turn_command()` with a NULL check on `nbh` to match
## Motivation
On the UDP data-relay hot path, every SEND indication triggered a
pool-get + pool-put cycle for a response buffer that was never used.
This is the highest-frequency STUN command type during active media
relay. The change eliminates one unnecessary 65 KB buffer round-trip per
SEND indication.
## Test plan
- [ ] Build passes clean (`cmake .. && make -j$(nproc)`)
- [ ] Run RFC 5769 conformance tests (`examples/scripts/rfc5769.sh`)
- [ ] Run basic UDP relay test to verify SEND indications still relay
data correctly
- [ ] Verify STUN requests (ALLOCATE, REFRESH, BINDING request) still
receive proper error responses
Remove the unused mutex field and associated lock/unlock functions from
the `ur_map` structure.
- Removed `TURN_MUTEX_DECLARE(mutex)` field from `struct _ur_map`
- Removed mutex initialization in `ur_map_init()`
- Removed mutex destruction in `ur_map_free()`
- Removed `ur_map_lock()` and `ur_map_unlock()` functions that were not
being used
This cleanup reduces unnecessary synchronization overhead and simplifies
the codebase.
Changes:
This PR optimizes authentication path for the common use case (WebRTC)
where a combination of REST API + static secrets + no OAuth is used
(`--use-auth-secret --static-auth-secret`)
## Inline auth on relay threads
When a TURN REST API request arrives with long-term credentials, static
secrets in RAM, and no OAuth (REST API + static secrets + no OAuth), the
auth is now performed directly on the relay thread instead of being
dispatched to a dedicated auth worker thread. This eliminates:
- Serialization of the auth_message into a bufferevent
- A cross-thread context switch
- Deserialization on the auth worker side
- The relay-to-auth-thread path is preserved as a fallback for OAuth or
DB-only secret configurations.
## Reduce auth threads count
When the inline path handles all auth (REST API + static secrets + no
OAuth), the auth thread count drops to 2 (1 housekeeping + 1 fallback
worker), avoiding idle threads. On large machines that can be a lot of
threads.
## Changes
Add proper null pointer checks and error handling in the `post_parse()`
function to prevent potential null pointer dereferences:
- Add null check after `calloc()` for headers list allocation
- Check return values from `realloc()` and `strdup()` before using them
- Properly clean up allocated memory on allocation failures
- Add forward declaration for `free_headers_list()` helper function
This prevents crashes when memory allocation fails during HTTP POST
request parsing.
fixes#1762
## Summary
This change lets the listener batch incoming UDP datagrams when
`--udp-recvmmsg` is enabled, reducing per-packet overhead on busy
listeners while preserving the existing behavior as the default and
fallback path.
## What changed
- add a new `--udp-recvmmsg` runtime flag
- implement a batched UDP receive path in the DTLS listener using
`recvmmsg()`
- reuse packet classification and datagram processing logic across
batched and non-batched receive paths
- reduce buffer/metadata churn by reusing listener-side scratch state
and network buffers
- keep compatibility safeguards by falling back when `recvmmsg()` is
unavailable or unsupported
- expose the setting in admin/CLI configuration output
- update the example test runner to enable the flag on Linux
## Why
The current listener processes UDP datagrams one at a time. On Linux,
`recvmmsg()` allows the server to receive multiple packets per syscall,
which should improve throughput and lower CPU overhead under load for
UDP-heavy traffic.
## Notes
- the feature is opt-in and defaults to disabled
- the implementation is Linux-specific and leaves the existing path
unchanged on other platforms
- the listener still falls back to the legacy receive path if batched
receive is unavailable at runtime
## Testing
- updated `examples/run_tests.sh` to pass `--udp-recvmmsg` on Linux
- validated behavior through the existing listener flow and fallback
handling