coturn

Commit Graph

Author	SHA1	Message	Date
Kai Ren	732c249b92	Upgrade Docker image to 4.11.0 Coturn version - Update Debian "trixie" to 20260518 snapshot in Docker image	5 days ago
Pavel Punsky	bfacd81627	Merge commit from fork * Parameterize SQLite user-DB driver queries (eliminate SQL injection class) The SQLite driver built every statement with snprintf, interpolating caller-supplied values directly into the SQL text before sqlite3_prepare. That left every value-carrying query injectable (the admin-panel delete paths fixed in GHSA-v8hj-2xx7-xmp5 were the reachable instances; this removes the underlying class for this backend). Add a sqlite_prepare_bind() helper that prepares a statement using '?' placeholders and binds the values as text parameters, and route all value-carrying statements through it: users, secrets, origins, realm options, oauth keys, permission IPs, and admin users. Values never enter the SQL text again. The only remaining interpolation is the peer-ip table name (allowed/denied), which cannot be a bound parameter and is already validated against that whitelist by the caller; its realm/ip values are now bound. Static, parameter-free statements are unchanged. oauth timestamp/lifetime were previously rendered as numeric literals; they are now formatted to decimal text and bound. The oauth_key columns have integer affinity, so SQLite stores the bound text as the same integer and reads it back identically (verified). Add tests/test_sqlite_dbd.c: an interface test that drives the driver's public vtable against a throwaway database, covering every converted entry point plus a SQL-injection regression test. Run against the old string-interpolated driver the seven behavioral tests pass unchanged (parity) while the injection test fails (a boolean payload in del_user deletes a whole realm's users); against the new driver all eight pass. The driver is compiled in isolation via a small support/stub layer (tests/test_sqlite_support.) so the suite needs no running server. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Parameterize PostgreSQL user-DB driver queries (eliminate SQL injection class) Like the SQLite driver, dbd_pgsql.c built every statement with snprintf, interpolating caller-supplied values into the SQL text and running it via PQexec -- which on PostgreSQL also permits stacked queries, the highest- impact instance of GHSA-v8hj-2xx7-xmp5. Add a pq_exec_params() helper around PQexecParams and route all value-carrying statements through it with $1,$2,... placeholders and the values passed as out-of-band text parameters: users, secrets, origins, realm options, oauth keys, permission IPs, and admin users. Values never enter the SQL text. The only remaining interpolation is the peer-ip table name (allowed/denied), which cannot be a bound parameter and is already whitelisted by the caller; its realm/ip values are now bound. Static, parameter-free SELECTs keep using PQexec. oauth timestamp/lifetime and the realm-option value were numeric literals; they are now formatted to decimal text and bound (PostgreSQL casts them to the column's integer type, preserving behavior). Add tests/test_pgsql_dbd.c with a link-seam libpq mock (test_pgsql_stub.c) that captures each emitted command, whether it was parameterized, and the bound parameter values -- so the driver is unit-tested with no server. The tests assert every operation emits a parameterized query with caller values bound out-of-band, and test_sql_injection_neutralized asserts a boolean payload travels as a bound value, never as SQL text. Run against the old driver all of these fail (it calls PQexec with values interpolated and binds nothing); against the new driver all pass. Relay-side stubs are reused from test_sqlite_support.c. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Parameterize MySQL user-DB driver with prepared statements (eliminate SQL injection class) dbd_mysql.c built every statement with snprintf and ran it via mysql_query(), interpolating caller-supplied values into the SQL text -- the same SQL injection class fixed for SQLite and PostgreSQL, now closed for the MySQL backend. Convert the whole driver to the prepared-statement API. Two helpers carry the risk: - my_exec() prepares an INSERT/UPDATE/DELETE, binds text params, executes; - my_query_rows() prepares a SELECT, binds text params and text result columns, and invokes a per-row callback (replacing mysql_query + mysql_store_result + mysql_fetch_row). Every value-carrying statement now uses '?' placeholders with values bound out-of-band; the read paths get small row callbacks. The only remaining interpolation is the peer-ip table name (allowed/denied), which cannot be a bound parameter and is already whitelisted by the caller; its realm/ip values are bound. oauth timestamp/lifetime and the realm-option value are formatted to decimal text and bound (MySQL casts to the column's integer type). Add tests/test_mysql_dbd.c with a link-seam libmysqlclient mock (test_mysql_stub.c) that captures the prepared SQL and bound parameters, and can feed one canned result row so the read paths' bind-result/fetch handling is exercised (get_user_key, get_oauth_key, get_admin_user, get_auth_secrets all round-trip). The mock also implements the classic mysql_query/store_result API so the suite links and runs against the old driver: there every test fails ("got mysql_query()") while all pass against the new one, and test_sql_injection_neutralized asserts a boolean payload travels as a bound value. Relay-side stubs are reused from test_sqlite_support.c (with ur_string_map_free added there). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	6 days ago
Pavel Punsky	b84dbab1d1	Fix string validation (#1924 )	7 days ago
Pavel Punsky	c02aa24a52	Update version to 4.12.0 (#1923 )	7 days ago
Pavel Punsky	a6bf67265a	Update khash to the latest version (#1919 )	1 week ago
Pavel Punsky	de2111c012	Update readme to match latest changes (#1920 )	1 week ago
Gustavo Garcia	e468cee3bd	Add DeepWiki link to the README	2 weeks ago
Corey Siltala	7016b565fa	Update docs for drop-invalid-packets and response-origin-only-with-rfc5780 (#1918 ) I attempted to use `make-man.sh` and it added a ton of man macro formatting back in, since it was a minor targeted fix I just updated `turnserver.1` manually. Fixes: https://github.com/coturn/coturn/issues/1917 --------- Co-authored-by: Corey Siltala <csiltala@atcorp.com> Co-authored-by: Pavel Punsky <eakraly@users.noreply.github.com>	2 weeks ago
Pavel Punsky	e0c37a3aa0	Multiplexpeer (#1916 ) ## Summary Adds `--multiplex-peer`, a non-standard relay mode that replaces the per-allocation peer-side port bind with one shared IPv4+IPv6 UDP socket pair per relay thread. Sessions are demultiplexed by exact peer IP:port in a per-thread `mp_table`. This lifts the ~16 k allocation cap that the default 49152-65535 relay port range imposes, and dramatically reduces kernel-level UDP receive-buffer drops under high pps. Design and trade-offs: [docs/multiplex-peer.md](docs/multiplex-peer.md). ## What changes ### Server (turnserver) - `--multiplex-peer` (cross-platform) — enable the shared per-thread relay sockets. Replaces the per-session port bind. Implies sendmmsg batching on Linux and default-enables `--udp-recvmmsg` (override with `--udp-recvmmsg=0`). Incompatible with EVEN-PORT — those Allocates are rejected with 400. - `--multiplex-peer-port <port>` (cross-platform, default 3480) — base port; thread `i` binds `<base>+2i` (IPv4) and `<base>+2i+1` (IPv6). A 4-thread server consumes 8 ports. - `--udp-gso` (Linux-only CLI) — UDP-GSO (`UDP_SEGMENT` cmsg) on the relay send path. Requires `--multiplex-peer` (which is what enables the sendmmsg batching GSO piggybacks on); passing `--udp-gso` alone is a silent no-op. - CLI surface tightened: `--udp-recvmmsg`, `--udp-recvmmsg-log`, `--udp-gso` and their fields are now `#if defined(__linux__)` — absent from `--help`, rejected with `unrecognized option`, and the code paths compile out on macOS/Windows. - Windows portability: `SO_REUSEPORT` in `mp_open_socket` wrapped in `#ifdef` (MSVC's Winsock doesn't define it; REUSEPORT was defensive anyway because the per-thread port layout is unique by construction). - `--sock-buf-size` honoured at startup: the shared multiplex-peer relay socket now calls `set_ioa_socket_buf_size` in `mp_open_socket` so the configured rcvbuf is in effect from the moment the socket exists, not deferred to the first Allocate. ### turnutils_uclient (loadgen) - `--no-even-port` — force `ep = -1` on Allocate. The default path randomly attaches EVEN-PORT (with no-R bit) even under `-c`, which `--multiplex-peer` strictly rejects with 400; this flag makes alloc-flood runs against multiplex-peer deterministic. - Legacy `timer_handler` now wraps the per-tick send batch with `uclient_send_batch_begin/_end` — without this, runs with `--sender-threads 0` (the default for `-m < 4`) silently fell through every send to plain `send(2)`. strace A/B: 205 k `sendto` → 61 k `sendmsg` (GSO) + 4 k `sendmmsg` + small `sendto` residual for control. ## Measured impact (3-droplet DigitalOcean, c-4 / 4 vCPU, 8 concurrent UDP streams, 45 s) \| \| baseline \| `--udp-recvmmsg` \| `--multiplex-peer` \| `--multiplex-peer --udp-gso` \| \|---\|---:\|---:\|---:\|---:\| \| Server NIC rx pps (UDP relay both legs) \| 350 k \| 334 k \| 326 k \| 294 k \| \| Server `UdpInDatagrams` pps \| 279 k \| 292 k \| 300 k \| 294 k \| \| Server `UdpRcvbufErrors` pps \| 71 k \| 42 k \| 26 k \| 0.3 k (−99.6 %) \| \| `turnserver` process CPU \| 387 % \| 205 % \| 283 % \| 133 % (−65 %) \| \| Server host idle \| 22 % \| 49 % \| 41 % \| 68 % \| Same loadgen-side packet rate (~2 M pps reported by uclient `send_pps` after the legacy-path batching fix). Iteration log: [docs/PerformanceIterationLog.md](docs/PerformanceIterationLog.md). ## Test plan - [x] `ctest --test-dir build` — 3/3 pass (test_ioaddr, test_stun_msg, test_http_server) on macOS + Linux. - [x] `examples/run_tests.sh` — 4 protocols + 4 threaded + load-gen smoke on Linux; 4 protocols on macOS. - [x] `examples/run_tests_conf.sh` — same coverage, conf-driven. - [x] `examples/run_tests_multiplex_peer.sh` — UDP/TCP/TLS/DTLS via `--multiplex-peer --multiplex-peer-port=35000` on macOS + Linux. - [x] Flag matrix smoke on macOS: `--multiplex-peer`, `--multiplex-peer-port=42000`, `--multiplex-peer --udp-gso` (no-op), `uclient --no-even-port`, `uclient --listener-threads N --sender-threads M` — all pass; `--udp-recvmmsg` / `--udp-gso` correctly rejected with `unrecognized option`. - [x] Flag matrix smoke on Linux (Docker): same + `--udp-recvmmsg` accepted, `--multiplex-peer` auto-enables `--udp-recvmmsg`, `--udp-recvmmsg=0` overrides the auto-enable. - [x] Windows compile fix verified — `SO_REUSEPORT` no longer referenced unconditionally. - [x] 3-droplet perf matrix completed; per-hop UDP counters captured. ## Docs updated - New: [docs/multiplex-peer.md](docs/multiplex-peer.md) - [README.turnserver](README.turnserver): full entries for `--multiplex-peer`, `--multiplex-peer-port`, `--udp-gso`; clarified `--udp-recvmmsg` auto-enable semantics. - [README.turnutils](README.turnutils): added `--no-even-port`, plus previously-undocumented `--listener-threads` / `--sender-threads` loadgen pool flags. - [examples/etc/turnserver.conf](examples/etc/turnserver.conf): commented `udp-recvmmsg`, `udp-recvmmsg-log`, `udp-gso`, `multiplex-peer`, `multiplex-peer-port` keys with one-paragraph descriptions and pointer to `docs/multiplex-peer.md`. - Man pages regenerated via `./make-man.sh`. --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2 weeks ago
Pavel Punsky	7cde430a98	examples: exercise uclient thread pools, UDP-GSO, recv_pps in CI tests (#1914 ) ## Summary `examples/run_tests.sh` and `examples/run_tests_conf.sh` previously covered only the legacy single-threaded uclient path. After recent PRs they were no longer exercising: - `--listener-threads` from #1911 (recv thread pool) - `--sender-threads` from #1913 (send thread pool) - `--udp-gso` from #1907 (server-side GSO send) - the `recv_pps` metric introduced in #1913 A regression in any of those would have shipped silently. This PR expands both scripts so every CI cycle hits all four features end-to-end. ## What changes Per-protocol coverage doubled. The four protocol tests (TCP / TLS / UDP / DTLS) are factored into a shell function and each runs twice: 1. Default uclient flags — legacy single-thread paths. 2. `--listener-threads 1 --sender-threads 1` — engages both pools at minimum non-zero size, so the slab counters, per-thread libevent base, and pick_listener_base / pick_sender_id paths all fire. The `grep "tot_send_bytes ~ 1000, tot_recv_bytes ~ 1000"` target is stable across both variants because the workload (`-m 1 -n default=5 -l 200`) totals 1000 bytes either way — threading doesn't change the byte count, only the machinery that produces it. Server-side GSO enabled on Linux. `run_tests.sh` adds `--udp-gso` next to the existing `--udp-recvmmsg`. `run_tests_conf.sh` writes both keys into the generated `turnserver.conf` on Linux. The conf-file parser uses the same long-options table as the CLI (see `mainrelay.c` `long_options`), so the keys map 1:1. Load-gen smoke on Linux. A short `-Y packet -m 4 -l 100 -c` run with `--sender-threads 2` ends each script: ```bash timeout -s INT 6s turnutils_uclient \ -Y packet -m 4 -l 100 -c -e 127.0.0.1 -g \ --listener-threads 1 --sender-threads 2 \ -u user -W secret 127.0.0.1 ``` The grep asserts the progress print contains a non-zero `send_pps` AND a non-zero `recv_pps`. Non-zero `recv_pps` is the canonical signal that the listener-pool slab reduction (`recv_count_snapshot`) is feeding back into the load-rate reporter — the codepath most likely to silently drop output if the thread pool's stop ordering or slab plumbing regresses. `timeout -s INT 6s` triggers a clean SIGINT-driven exit; exit codes 0, 124, and 130 all count as success because we just want a fixed-duration run. ## Test plan Linux Docker (authoritative): ``` === run_tests.sh === Using TURNSERVER_EXTRA_ARGS="--udp-recvmmsg --udp-gso" Running turn client TCP OK Running turn client TLS OK Running turn client UDP OK Running turn client DTLS OK Running turn client TCP (threaded) OK Running turn client TLS (threaded) OK Running turn client UDP (threaded) OK Running turn client DTLS (threaded) OK Running turn client UDP load-gen smoke OK === run_tests_conf.sh === Running turn client TCP OK Running turn client TLS OK Running turn client UDP OK Running turn client DTLS OK Running turn client TCP (threaded) OK Running turn client TLS (threaded) OK Running turn client UDP (threaded) OK Running turn client DTLS (threaded) OK Running turn client UDP load-gen smoke OK ``` 9/9 in each. macOS local TCP test fails the same way it did before this change (pre-existing flake on Darwin loopback TCP, unrelated to threading or GSO). - [x] Both scripts run end-to-end in Linux Docker against a fresh build. - [x] `--help` flag list verified: `--listener-threads`, `--sender-threads`, `-Y` all present. - [x] No new dependencies; only uses `timeout` (already in coreutils on Linux base images).	2 weeks ago
Pavel Punsky	0f81165ad8	Fix TTL/TOS type conversion (#1915 ) Fix UB in udp_recvfrom: recv_ttl/recv_tos were unsigned char locals whose addresses were cast to int * and passed to ioa_parse_udp_recvmsg_cmsg, which writes a full int through that pointer — clobbering 3 bytes of adjacent stack on every UDP receive. Manifested as a TTL/TOS bug on macOS. Change the locals in udp_recvfrom to int so no pointer cast is needed. Hoist the recv_ttl_t / recv_tos_t typedefs out of the two function bodies and into ns_ioalib_impl.h so they remain shared by the cmsg parser. ## Test plan [x] cmake -S . -B build -DBUILD_TESTING=ON && cmake --build build -j && ctest --test-dir build --output-on-failure [x] cd examples && ./run_tests.sh && ./run_tests_conf.sh [x] Verify relayed UDP carries the expected TTL/TOS on macOS (previously corrupted) [x] Linux build + system tests in Docker per CLAUDE.md	2 weeks ago
Pavel Punsky	fb94ab117d	turnutils_uclient: sender thread pool + UDP-GSO send batching + recv_pps reporting (#1913 ) ## Summary Three related changes to `turnutils_uclient` that together unblock the loadgen from being the bottleneck when benchmarking the relay: 1. Sender thread pool (`--sender-threads <N>`, max 4, auto-bumped to 2 at `-m >= 4`). Mirrors the listener pool that landed in #1911. Each sender thread owns its own libevent base, a session shard (round-robin assigned at allocation time via `elem->sender_id`), and a 100 µs timer that runs the burst loop just like the legacy main-thread `timer_handler` did. Send-side counters (`tot_send_messages`, `tot_send_bytes`, `tot_send_dropped`, `load_sent_packets`) and the completion accumulators in `client_timer_handler` (`total_loss` / `total_latency` / `total_jitter`) are written into per-thread cache-line-aligned slabs and reduced into the globals after `pthread_join`. This avoids the cross-core atomic-counter contention that the listener-pool work already documented. 2. UDP-GSO send batching in `send_buffer` for the plain-UDP path. The sender pool opens a thread-local batch window around its per-tick iteration; within the window, `send_buffer` copies the payload into a per-thread slot and appends to a scatter-gather `iov[]`. On flush: - If `count > 1` and all segments share the same size → one `sendmsg(2)` with a `UDP_SEGMENT` cmsg. - If GSO is unavailable (kernel returns `EINVAL`/`ENOPROTOOPT`/`EOPNOTSUPP`) → sticky-disable per thread, fall back to `sendmmsg(2)` over the same iov array. - Per-entry `send(2)` as the final fallback for whatever sendmmsg refused (EAGAIN tail, etc.). Auto-flush triggers: different fd (next session in iteration), different segment size, batch capacity (64), or end of iteration. 3. `recv_pps` in `print_load_generator_rate`, alongside the existing `send_pps`. Once the sender pool + GSO let uclient push >>1 Mpps of UDP, the meaningful end-to-end metric is the round-trip count, not the send-side count — the relay/peer pipeline drops 95+% of packets when uclient outpaces it. The progress line now reads: send_pps=6012928.00, recv_pps=101486.00, total_sent=112975924, total_recv=1853369 ## Why Benchmarking `--multiplex-client` / `--multiplex-peer` on a c-4 DigitalOcean droplet, the loadgen's single-threaded `timer_handler` saturated one CPU around 300 kpps regardless of `-m`. The relay was never put under real pressure, so the multiplex paths' value couldn't be measured. With this patch the loadgen can produce >6 Mpps from a single c-4 droplet, far above the relay's per-thread saturation point, so the bottleneck moves to the server where it belongs. ## Benchmark — multiplex-client turnserver, c-4 loadgen, m=4, 20 s \| Round \| OLD (master) \| NEW (this PR) \| Lift \| \|-------\|--------------\|---------------\|------\| \| 1 \| 246k send_pps \| 7.48M \| 30.4× \| \| 2 \| 459k \| 6.06M \| 13.2× \| \| 3 \| 360k \| 5.07M \| 14.1× \| \| avg \| 355k \| 6.20M \| 17.5× \| Throughput cap shifts from loadgen to relay. End-to-end recv_pps (which is now first-class in the progress line) is ~100 kpps in this configuration — limited by the relay, not uclient. ## Design notes - Cache-line alignment on `uclient_sender` mirrors the listener-pool's slab pattern. Same false-sharing trap, same fix. - Main-thread timer slows to 10 ms when the sender pool is engaged. The main timer still fires for lifecycle / `__turn_getMSTime` refresh, but `timer_handler` early-returns when `num_sender_threads > 0` so we don't burn a core on no-op 100 µs ticks. - Stop ordering: `stop_sender_threads()` runs before `stop_listener_threads()` — the senders own session mutation (wmsgnum, to_send_timems, shutdown), so joining them first prevents a race where a listener accumulates a stat into a session whose owning sender is still iterating it. - UDP-GSO copy: the per-slot memcpy is intentional. The caller (`client_write`) reuses `elem->out_buffer` across burst iterations, so pointing `iov[i]` at the session buffer would alias all entries to the most recent payload. A rotating per-session output ring would eliminate the copy — left out of this PR because the kernel-side savings from collapsing N sendmsg into one GSO sendmsg dominate the per-packet copy cost at the rates we measured. - Linux-only: send-side batching machinery is gated by `#if defined(__linux__)`. Non-Linux builds get no-op `uclient_send_batch_begin`/`_end` and `uclient_tx_enqueue` returns false, falling through to the legacy `send(2)` loop. ## Test plan - [x] macOS local build (Apple Silicon, AppleClang). Sender-pool code paths compile under both Linux and non-Linux gates. - [x] `clang-format-15 --dry-run --Werror` clean. - [x] Linux build on a c-4 Ubuntu 24.04 droplet (`cmake -DCMAKE_BUILD_TYPE=Release`). - [x] `--help` includes the new `--sender-threads` option with valid-range hint; out-of-range values rejected. - [x] Benchmark on two c-4 droplets in nyc1 against `turnserver --multiplex-client`: 3 alternating rounds OLD vs NEW, +17.5× average send-side lift (data table above). - [x] `print_load_generator_rate` output verified — `send_pps`, `recv_pps`, `total_sent`, `total_recv` all populated and consistent across listener slab reductions. ## Limitations - `--multiplex-peer` is not driven by this PR. uclient's pattern (each `-m N` opens two internal sessions per client that share the same peer port) hits the multiplex-peer "one allocation per peer endpoint" rule; benchmarking that flag at high concurrency requires a separate small change (per-session secondary peer port) — not in scope here. - The wider per-round variance under the sender pool (rounds in our bench ranged 13×–30× lift) is timing/scheduler noise at small per-thread shards. Smoothens out as `-m` and per-thread session counts grow.	3 weeks ago
Pavel Punsky	f7bb459357	Fix memory leak introduced by recvmmsg path (#1912 )	3 weeks ago
Pavel Punsky	df8912db5a	turnutils_uclient: multi-threaded listener (recv) pool (#1911 ) > Updated 2026-05-11 — three follow-up improvements applied on top of the original draft, with a single-trial real-Linux confirmation run. Auto threshold lowered, per-listener counter slabs added (eliminating the K=2/K=4 cache-line regression), per-elem stats atomicised. ## What Adds an N-thread receive pool to `turnutils_uclient`. Main thread keeps owning the sender timer, the lifecycle, and the control plane; `EV_READ` events for client UDP sockets are routed to one of N listener threads, each with its own libevent base. Sessions sharded round-robin across listeners at allocation time and pinned to their owning listener for the lifetime of the test. ### State ownership / synchronisation - Per-session bookkeeping (`recvmsgnum`, `recvtimems`, `rmsgnum`): mutated only by the owning listener thread — no locking, no atomics. - Per-session running stats (`elem->loss/latency/jitter`): listener does `__atomic_fetch_add`, the timer on main harvests with `__atomic_exchange_n` (read-and-zero atomically). Closes the race that existed in the original draft where a listener increment landing between the timer's read and its zero-store would be lost. - Per-test totals (`tot_recv_messages`, `tot_recv_bytes`, min/max latency/jitter): each listener has its own cache-line-aligned slab inside `uclient_listener`. The listener writes into its own slab (no atomic); main reads on-demand via snapshot helpers (atomic loads, no contention with writers); `stop_listener_threads()` folds slabs back into the global totals before reporting. An early draft used a single shared atomic counter and the K=2/K=4 path regressed ~19× on `-m {4, 8}`; the slabs close that gap. Sits on top of the recvmmsg path landed in #1910. ## CLI ``` -K N, --listener-threads N Number of receive threads. Default auto: -m < 2 -> K=0 (no worker thread, recv on main event_base) -m >= 2 -> K=1 (one worker thread, clean send/recv split) -K overrides the auto rule. Capped at 4. ``` `start_listener_threads()` logs the resolved count and whether it came from `-K` or the auto rule, so an operator can confirm what actually ran. ## Real-Linux bench (DigitalOcean, 2× c-4 / 4 vCPU, private VPC) Single-trial confirmation of this PR's follow-ups vs the previous 3-trial baseline. ### After follow-ups (this PR head, 1 trial, recv pps) \| `-m` \| K=0 \| K=1 \| K=2 \| K=4 \| \|---:\|---:\|---:\|---:\|---:\| \| 1 \| 4545 \| 3226 \| 3226 \| 3226 \| \| 2 \| 6250 \| 4878 \| 4878 \| 153 \| \| 4 \| 7692 \| 6557 \| 371 \| 350 \| \| 8 \| 8602 \| 689 † \| 708 \| 199 \| \| 16 \| 1174 \| 1339 \| 1281 \| 1283 \| \| 32 \| 1445 \| 1517 \| 2149 \| 1421 \| † K=1 `-m=8` = 689 looks like a single-trial outlier (the matching K=0 row finishes in 0.93 s sending all 8000 packets; K=1 hit the 14 s timeout with similar throughput). Re-running would clarify, but per the user's "1 iteration" instruction this is the data we have. ### Comparison vs the previous 3-trial averages \| `-m` \| metric \| before follow-ups \| after follow-ups \| delta \| \|---:\|:---\|---:\|---:\|:---\| \| 4 \| K=0 recv pps \| 5123 \| 7692 \| +50 % \| \| 8 \| K=0 recv pps \| 3967 \| 8602 \| +117 % \| \| 2 \| K=2 recv pps \| 1749 \| 4878 \| +179 % \| \| 4 \| K=2 status \| 344 pps, ~91 % loss \| 371 pps, 2.1 % loss \| regression unblocked \| \| 8 \| K=2 recv pps \| 665 \| 708 \| +6 % (loss 11 % → 3.2 %) \| \| 32 \| K=2 recv pps \| 1763 \| 2149 \| +22 % \| The K=0 hot path improved noticeably at mid-`m`. The K=2 cache-line-bouncing regression that originally made the auto-threshold conservative is gone — K=2 at `-m=4` went from "9× worse than K=1" (344 pps) to "comparable to K=0 with low loss" (371 pps, 2.1 % loss). K=2 at `-m=32` reached 2149 pps, the new high-concurrency winner. ### Auto-rule change With the K=2 regression unblocked, `UCLIENT_AUTO_LISTENERS_THRESHOLD` lowered from 4 to 2. The auto-default now goes: - `-m=1` → K=0 (still the lowest-overhead config; ~30 % faster than K=1 at this concurrency) - `-m>=2` → K=1 (clean send/recv split; consistently strong from m=4 upward) Explicit `-K` still overrides, so anyone curious about K=2/K=4 on different hardware can dial them up — they're no longer landmines. ## Combined effect on top of #1910 vs `master` before either improvement, at `-m=8`: 148 → 8602 pps = 58× speedup. ## Test plan - [x] Builds clean on macOS (clang) and Linux (gcc, Debian Trixie via container) - [x] `make lint` clean for the modified files - [x] CLI parsing smoke (`-K 99` rejected, `-K 0` works, `--listener-threads` long form works) - [x] Real-Linux single-trial bench across `K ∈ {0,1,2,4}` × `m ∈ {1,2,4,8,16,32}` - [x] Auto-bump log line confirmed (`uclient: started 1 listener thread(s) (auto)`) - [x] Per-listener slab fold-back verified end-to-end (totals match after `pthread_join`) - [ ] Functional smoke: `examples/scripts/basic/relay.sh` + `examples/scripts/basic/udp_c2c_client.sh` - [ ] CI ## CMake / portability - pthreads pulled in transitively via `turnclient` → `common` (`find_package(Threads REQUIRED)`). No link-line change needed. - `getopt_long()` requires `<getopt.h>` — included unconditionally on POSIX so the long-option table compiles on macOS as well as glibc. - `_Thread_local`, `__atomic_*` builtins, struct `aligned(64)` attribute: C11 / GCC builtins, available on glibc + clang.	3 weeks ago
nfuhler	faff5bf106	examples/turnserver.conf: update description of cli option (#1909 ) the previous description does not describe the correct default since commit `9467af5`	3 weeks ago
Pavel Punsky	284e441a00	turnutils_uclient: Linux recvmmsg receive path + larger SO_RCVBUF (#1910 ) ## Summary Two improvements to `turnutils_uclient`'s receive path. With `-Y packet -m N` workloads the loadgen was previously hitting client-side queue overflow before the server was anywhere near saturated, so reported "lost packets" reflected uclient's own kernel-buffer drops rather than real server loss. Mirrors the recvmmsg work already landed in `turnutils_peer` (#1908) and the relay (#1906) — this closes the loop on the loadgen side. - `SO_RCVBUF`/`SO_SNDBUF` 64 KB → 4 MB. New `UCLIENT_SOCK_BUF_SIZE` constant in `uclient.h`, applied at the 3 socket-creation sites in `startuclient.c`. `set_sock_buf_size()` already halves on `EPERM/EINVAL` so this is safe under any `net.core.rmem_max`. - Linux-only `recvmmsg(2)` batched receive in `client_input_handler`. Refactored `client_read()` → extracted `process_received_buffer()` so the per-packet processing (channel/Indication/Data parsing, latency/jitter accounting) can run after either a single `recv()` or a batched `recvmmsg()`. New `client_read_batch_udp()` drains up to 32 datagrams per syscall; `client_input_handler` dispatches to it for plain UDP only (no SSL/DTLS, no TCP, no TCP-relay sub-connections). All other paths fall through to the legacy single-`recv()` loop unchanged. Behind `#if defined(__linux__)` with `_GNU_SOURCE`; static scratch buffers (32 × 2048 B = 64 KB). ## Bench (local Docker, 3 trials per concurrency, identical server, only uclient binary differs) \| `-m` \| OLD recv pps \| NEW recv pps \| speedup \| OLD loss \| NEW loss \| \|---:\|---:\|---:\|---:\|---:\|---:\| \| 1 \| 58 \| 71 \| +22% \| 42% \| 27% \| \| 2 \| 78 \| 124 \| +59% \| 60% \| 33% \| \| 4 \| 135 \| 211 \| +56% \| 64% \| 45% \| \| 8 \| 145 \| 359 \| +148% \| 80% \| 50% \| \| 16 \| 224 \| 591 \| +164% \| 84% \| 54% \| Send PPS is essentially unchanged (uclient was never send-bound). Server CPU stayed below 1% across all runs — confirming the bottleneck was the loadgen, not the server. ## Notes - macOS / *BSD keep the legacy single-`recv()` loop. The SO_RCVBUF bump still applies and helps somewhat. - The remaining loss at higher concurrency is loadgen single-event-loop saturation, which is a larger restructuring (worker-pool uclient) and out of scope for this PR. - DTLS / TCP / SSL paths are deliberately left on the legacy code path — `recvmmsg` doesn't apply. ## Test plan - [x] Builds clean on macOS (clang) and Linux (gcc, Debian Trixie via container) - [x] `make lint` clean for the modified files - [x] Functional smoke: `examples/scripts/basic/relay.sh` + `examples/scripts/basic/udp_c2c_client.sh` still pass - [x] Bench above (3 trials per `-m {1,2,4,8,16}`)	3 weeks ago
Pavel Punsky	5959ecfb13	Add UDP-GSO send path (--udp-gso) (#1907 ) ## Summary - New `--udp-gso` flag (Linux, requires `--udp-sendmmsg`) collapses same-destination, same-size sendmmsg batches into a single `sendmsg` with a `UDP_SEGMENT` cmsg, so the kernel allocates one super-skb that traverses the network stack once and is segmented at egress instead of running `udp_sendmsg → ip_finish_output → __dev_queue_xmit` per datagram. - Also wraps the relay-side `recvmmsg` callback loop in `udp_sendmmsg_batch_begin/end` so peer→client sends triggered inside a recv batch can also coalesce — without that wrapping the relay path issues one `sendto` per delivered datagram. - Sticky-disable on `EINVAL/ENOPROTOOPT` for older kernels/NICs that lack UDP-GSO; one warning logged, then transparent fallback to the existing `sendmmsg` and `udp_send` paths. ## Why The `--udp-recvmmsg` and `--udp-sendmmsg` follow-ups confirmed (see [docs/PerformanceIterationLog.md](docs/PerformanceIterationLog.md)) that on the relay flood workload the dominant cost is the per-datagram kernel TX path. mmsg-style batching reduces only the syscall entry/exit, not the per-skb stack traversal — UDP-GSO collapses both. ## Result DigitalOcean nyc1 c-4, 30 s alternating A/B, `-Y packet -m 1`, eth1 TX as the authoritative server forwarding metric: \| Variant \| eth1 RX \| eth1 TX \| sys CPU \| idle CPU \| \|---\|---:\|---:\|---:\|---:\| \| baseline (no flags) \| 322,091 \| 127,445 \| 22.9 % \| 67.5 % \| \| `--udp-recvmmsg --udp-sendmmsg --udp-gso` \| 266,068 \| 257,996 \| 15.0 % \| 78.7 % \| \| baseline (no flags) \| 309,475 \| 125,573 \| 20.9 % \| 70.7 % \| \| `--udp-recvmmsg --udp-sendmmsg --udp-gso` \| 275,992 \| 225,366 \| 14.9 % \| 74.3 % \| Mean server forwarding rate: 126.5 k → 241.7 k pps (+91 %, 1.91×), mean system CPU 21.9 % → 14.9 % — about 2.8× CPU efficiency (TX pps per system-CPU-%). Full perf-children comparison and methodology in the new section of [docs/PerformanceIterationLog.md](docs/PerformanceIterationLog.md). ## Notes for reviewers - `--udp-gso` is opt-in and requires `--udp-sendmmsg` (the help text states the dependency). Without `--udp-sendmmsg` the batch state never accumulates and GSO has nothing to flush. - GSO eligibility resets on every `_begin/_end`. Mixed-destination, mixed-size, or oversize batches transparently fall back through `sendmmsg` / `udp_send`. - Rebased onto current `master`; the recvmmsg dependency is already merged via #1906. ## Test plan - [x] `cmake --build build --target turnserver` (RelWithDebInfo + ASan local builds clean) - [x] `ctest --test-dir build --output-on-failure` — 3/3 unit tests pass - [x] `examples/run_tests.sh` — TCP/TLS/UDP pass; DTLS pre-existing failure on macOS environment, unrelated to this change - [x] DigitalOcean A/B perf validation captured above - [ ] Reviewer to confirm CI green on Linux build/test/CodeQL --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	3 weeks ago
Kai Ren	78c1f7c7ce	Upgrade Docker image to 4.11.0 Coturn version	3 weeks ago
Kai Ren	259f0d3c67	Update Debian "trixie" to 20260406 snapshot in Docker image	3 weeks ago
Pavel Punsky	e59e227dfd	turnutils_peer: Linux fast path with drain loop, recvmmsg/sendmmsg, U… (#1908 ) …DP-GSO The libevent EV_READ handler used to do one recvfrom + one sendto per ready event, so a packet flood through the relay generated O(N) libevent re-entries and 2N syscalls per N relayed datagrams — saturating one core on the loadgen-side peer well below modern relay throughput. On Linux, replace the handler with: * a drain loop: keep recvmmsg'ing in MSG_DONTWAIT until the queue returns less than a full batch, bounded by MAX_DRAIN_ROUNDS so a flood can't starve the rest of the event loop; * recvmmsg into a static mmsghdr[32] (peer is single-threaded) and reuse the same mmsghdr array for sendmmsg back — each entry already has msg_name pointing at the source (the echo destination) and the iovec pointing at the received bytes, so no userspace copy; * UDP-GSO: when the recvmmsg batch is homogeneous (≥2 entries, same source, same size, ≤1472 B), echo it as one sendmsg with UDP_SEGMENT cmsg so the kernel allocates one super-skb that traverses the network stack once. The non-Linux build keeps the original recvfrom/sendto handler. DigitalOcean nyc1 c-4 30 s alternating A/B paired with the GSO turnserver (-Y packet -m 1): old peer: turn TX mean 228 k pps, peer CPU mean 91.0 % (saturated) new peer: turn TX mean 255 k pps, peer CPU mean 28.8 % Peer CPU drops 3.2× while turn-side throughput climbs ~12 % because the old peer was no longer fully reflecting at the GSO turnserver's rate. The peer is no longer the loadgen-side bottleneck, freeing CPU for multi-flow tests. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	3 weeks ago
Pavel Punsky	a5005c4193	Relay recvmmsg (#1906 ) ## Summary Extends the existing Linux-only `--udp-recvmmsg` flag from the UDP listener socket to also cover connected per-session UDP relay sockets, so steady-state client→relay and peer→relay traffic on plain UDP is read in batches of up to 16 datagrams per `recvmmsg(2)` instead of one `recvmsg` per packet. DTLS sessions still go through the SSL read path and are unchanged. The flag stays opt-in: receive-side batching works correctly, but on the current `m=1` / `m=100` benchmarks throughput is flat to slightly negative — the bottleneck has moved past receive (see results below). ## What's in the change - Shared receive helpers (`src/apps/relay/ns_ioalib_engine_impl.c`, `src/apps/relay/ns_ioalib_impl.h`): - `ioa_parse_udp_recvmsg_cmsg()` — single TTL/TOS/`IP_RECVERR` cmsg parser used by both `udp_recvfrom()` and the new batch path. Replaces the duplicated parser previously inlined in `dtls_listener.c` and `udp_recvfrom()`. - `ioa_init_recvmmsg_hdr()` — single initializer for `mmsghdr`/`iovec`/cmsg/source-address fields, also used by the listener. - New `IOA_UDP_RECVMMSG_MAX_BATCH = 16` constant; both listener and relay paths now share it. - Connected relay batch read (`socket_udp_read_batch_recvmmsg` in `ns_ioalib_engine_impl.c`): called from `socket_input_worker` for non-SSL UDP sockets when `--udp-recvmmsg` is on. Allocates per-message `stun_buffer_list_elem`s, calls `recvmmsg(MSG_DONTWAIT)`, dispatches each datagram through the existing `read_cb` path, and falls back cleanly on `ENOSYS`/`EINVAL`/`EOPNOTSUPP` (auto-disables the flag) and on `EAGAIN`/short-batch (releases unused buffers). - Per-engine scratch state: the `mmsghdr[16]` / `iovec[16]` / cmsg / src-addr arrays live on `ioa_engine`, not on every socket — keeps memory flat at thousands of allocations. - TTL/TOS-sized cmsg buffers in the listener: the listener previously over-allocated `64 KiB` per slot; it now uses the same TTL+TOS sizing as the relay path. - Opt-in occupancy stats behind a new `--udp-recvmmsg-log` flag: every 10 s the relay logs `udp-recvmmsg stats: calls=… packets=… avg_batch=… wouldblock=… unavailable=… no_buffer=… hist_1=… hist_2=… hist_3_4=… hist_5_8=… hist_9_16=…`. Counters are always tracked (cheap); the periodic log is gated by the new flag so default operation is silent. - CLI plumbing: `--udp-recvmmsg-log` long option in `mainrelay.c`/`mainrelay.h`, `cli_print_flag` entry in `turn_admin_server.c`, doc updates in `README.turnserver`. - Docs: `docs/PerformanceIterationLog.md` records the iteration steps, validation, and two rounds of DigitalOcean A/B numbers. `CLAUDE.md` load-test instructions updated to mention the new flag and the `tot_recv_msgs` / `tot_recv_bytes` workaround.	3 weeks ago
Pavel Punsky	b1d5c467f3	fuzzing: use hex escapes for HTTP EOH dictionary entry (#1905 ) ## Summary `fuzzing/stun.dict` line 147 used C-style `\r\n\r\n` for the HTTP end-of-headers keyword: ``` kw_http_eoh="\r\n\r\n" ``` libFuzzer's `ParseDictionaryFile` only accepts three escape sequences inside quoted entries: `\\`, `\"`, and `\xAB` hex. `\r` / `\n` are unrecognized, so a local fuzz run aborts dictionary load with: ``` ParseDictionaryFile: error in line 147 kw_http_eoh="\r\n\r\n" ``` Replace with the hex form used by the other 111 entries in the file: ``` kw_http_eoh="\x0d\x0a\x0d\x0a" ```	3 weeks ago
Pavel Punsky	61332bebca	Sync turnserver man page with current CLI options (#1903 ) The turnserver man page had drifted from the actual CLI options the binary accepts. The shipped `man/man1/turnserver.1` was last regenerated on 05 June 2021, so several options added since then were missing and one removed option was still documented. The man page is auto-generated from `README.turnserver` via `make-man.sh` (txt2man), so the source-of-truth edit is in the README; the `.1` files are then regenerated. In `README.turnserver`: - Add 13 options that exist in `mainrelay.c` long_options[] but were undocumented: --include-reason-string, --syslog-facility, --drop-invalid-packets, --drop-invalid-packets-log, --udp-recvmmsg, --respond-http-unsupported, --prometheus-address, --prometheus-path, --version, --cpus, --no-cli, --no-rfc5780, --response-origin-only-with-rfc5780. - Document --sql-userdb as an alias on the existing --psql-userdb line. - Remove the stale --ne=[1\|2\|3] entry (no longer parsed by the binary). The regenerated `man/man1/turnserver.1` also picks up a backlog of options that were already in the README but never reached the shipped page (--software-attribute, --cli, --sock-buf-size, --raw-public-keys, --stun-backward-compatibility, and the corrected --no-tlsv1_2 wording). `man/man1/turnadmin.1` and `man/man1/turnutils.1` are regenerated as a side-effect of `make-man.sh` running over all three READMEs; their content was similarly stale relative to README.turnadmin / README.turnutils.	3 weeks ago
Pavel Punsky	9d0cfca6f1	Remove stale --ne option from turnserver --help (#1904 ) ## Summary - The `--ne=[1\|2\|3]` option was already removed from `long_options[]` and the option parser, so `turnserver` rejects it at runtime, but the help text printed by `turnserver --help` still advertised it.	3 weeks ago
Pavel Punsky	36e1eee855	Restore CodeQL permissions, category, and manual build mode (#1901 ) PR #1517 (Jun 2024) simplified codeql.yml in ways that left scans incomplete: it dropped the actions:read / contents:read permissions and the analyze category, both of which CodeQL Action requires for results to land under the existing language category. Combined with the later cpp -> c-cpp rename and v3 -> v4 upgrade, scheduled scans have not refreshed the Security tab since Jun 1, 2024. - Add actions:read and contents:read back to job permissions - Set build-mode: manual on init (required for v3+/v4 manual builds) - Pass category "/language:c-cpp" on analyze so SARIF de-duplicates against the configured language - Build with --parallel so the tracer keeps up on default runners	3 weeks ago
dependabot[bot]	97fd597fcb	Bump repolevedavaj/install-nsis from 1.1.0 to 1.2.0 (#1899 ) Bumps [repolevedavaj/install-nsis](https://github.com/repolevedavaj/install-nsis) from 1.1.0 to 1.2.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/repolevedavaj/install-nsis/releases">repolevedavaj/install-nsis's releases</a>.</em></p> <blockquote> <h2>v1.2.0</h2> <!-- raw HTML omitted --> <h2>🚀 New features and improvements</h2> <ul> <li>Fix NSIS installer download (<a href="https://redirect.github.com/repolevedavaj/install-nsis/issues/40">#40</a> <a href="https://github.com/lordmulder"><code>@lordmulder</code></a> + <a href="https://redirect.github.com/repolevedavaj/install-nsis/issues/41">#41</a> <a href="https://github.com/repolevedavaj"><code>@repolevedavaj</code></a>)</li> </ul> <h2>📦 Dependency updates</h2> <ul> <li>Bump release-drafter/release-drafter from 6.1.0 to 7.2.0 (<a href="https://redirect.github.com/repolevedavaj/install-nsis/issues/37">#37</a>) @<a href="https://github.com/apps/dependabot">dependabot[bot]</a></li> <li>Bump actions/checkout from 5 to 6 (<a href="https://redirect.github.com/repolevedavaj/install-nsis/issues/29">#29</a>) @<a href="https://github.com/apps/dependabot">dependabot[bot]</a></li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`c14d0ea1b8`"><code>c14d0ea</code></a> Merge pull request <a href="https://redirect.github.com/repolevedavaj/install-nsis/issues/41">#41</a> from repolevedavaj/fix/strlen-patch-download-url</li> <li><a href="`7b8697d119`"><code>7b8697d</code></a> Apply PR <a href="https://redirect.github.com/repolevedavaj/install-nsis/issues/40">#40</a> download fix to strlen_8192 patch step</li> <li><a href="`9abd7fac24`"><code>9abd7fa</code></a> Merge pull request <a href="https://redirect.github.com/repolevedavaj/install-nsis/issues/37">#37</a> from repolevedavaj/dependabot/github_actions/release-d...</li> <li><a href="`c569f7c7d0`"><code>c569f7c</code></a> Merge pull request <a href="https://redirect.github.com/repolevedavaj/install-nsis/issues/40">#40</a> from lordmulder/main</li> <li><a href="`fb4d83d777`"><code>fb4d83d</code></a> Fixed.</li> <li><a href="`9f060d0595`"><code>9f060d0</code></a> Fixed</li> <li><a href="`187e8883da`"><code>187e888</code></a> Change NSIS installer download link and User-Agent</li> <li><a href="`59638711ae`"><code>5963871</code></a> Bump release-drafter/release-drafter from 6.1.0 to 7.2.0</li> <li><a href="`618f596b61`"><code>618f596</code></a> Merge pull request <a href="https://redirect.github.com/repolevedavaj/install-nsis/issues/29">#29</a> from repolevedavaj/dependabot/github_actions/actions/c...</li> <li><a href="`81fc6efc33`"><code>81fc6ef</code></a> Bump actions/checkout from 5 to 6</li> <li>See full diff in <a href="https://github.com/repolevedavaj/install-nsis/compare/v1.1.0...v1.2.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=repolevedavaj/install-nsis&package-manager=github_actions&previous-version=1.1.0&new-version=1.2.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	3 weeks ago
Pavel Punsky	238c311f05	Fix Prometheus metrics response leak (#1900 ) Fix memory leak introduced in #1853 Resolved #1898	3 weeks ago
Pavel Punsky	326816a92a	Update version to 4.11.0 (#1897 )	4 weeks ago
Pavel Punsky	24f474878e	Filc harness and pointer typedefs (#1896 ) ## Summary - Add a self-contained Fil-C build/test harness under `filc/` that mirrors the existing `fuzzing/` pattern: one host script (`filc/run-local.sh`) builds an Ubuntu 24.04 image with the [Fil-C](https://github.com/pizlonator/fil-c) optfil 0.678 toolchain, builds turnserver with `CC=filcc`/`CXX=fil++`, runs unit tests + system tests, and drops a per-run timestamped log directory with `SUMMARY.txt` + `ISSUES.txt`. - Fix the two real Fil-C compatibility bugs the harness surfaces by changing `ur_map_value_type` and `ur_addr_map_value_type` from `uintptr_t` to `void ` in `src/server/ns_turn_maps.h`. ## Why [Fil-C](https://fil-c.org) is a memory-safe C/C++ compiler (Clang 20 fork) that pairs every pointer with an "InvisiCap" capability and turns UB into deterministic panics with no `unsafe` escape hatch. Putting coturn through it answers two questions: (a) does it compile unmodified, and (b) does it run correctly under capability-enforced memory safety. After this PR, the answer is yes* for both — turnserver, turnutils_peer, and turnutils_uclient relay TCP/TLS/UDP/DTLS traffic with full Fil-C enforcement, all unit tests pass, and `examples/run_tests_conf.sh` runs end-to-end. ## What's in the PR ### `filc/` harness (commit 1) \| File \| Purpose \| \|---\|---\| \| `filc/Dockerfile` \| Ubuntu 24.04 + Fil-C optfil 0.678 (extracts the nested `fil.tar.xz` to `/opt/fil`); `--platform linux/amd64` so it works on Apple Silicon under emulation. \| \| `filc/run-local.sh` \| Host-side: build image, create `filc/logs/<UTC-ts>/`, run container with source mounted read-only and log dir mounted r/w. \| \| `filc/docker-entrypoint.sh` \| In-container orchestrator. Phases: env / source-copy / build / unit-tests / system-cli / system-conf. Runs every phase even when a prior one fails (no aborting mid-pass). Captures per-phase logs + a combined `all.log` + JUnit XML for ctest. Greps panics/errors into `ISSUES.txt`. Downgrades `system-` phases to FAIL when `examples/run_tests.sh` prints `FAIL` despite exiting 0 (existing fragility in those scripts). \| \| `filc/build.sh` \| `cmake … -DBUILD_TESTING=ON -DCMAKE_C_COMPILER=filcc -DCMAKE_CXX_COMPILER=fil++ -DCMAKE_BUILD_TYPE=RelWithDebInfo`, then build. \| \| `filc/.gitignore` \| Ignore the on-host `logs/` dir. \| The harness also bumps the post-launch sleep in `examples/run_tests.sh` from 2s to 6s only inside the container (sed-in-place on the copied source; upstream is untouched). Under linux/amd64 emulation the Fil-C-built turnserver isn't accepting TCP at 2s, so the first sub-test races and prints `FAIL`. Matches the 5s sleep already used by `run_tests_conf.sh`. ### Pointer-typedef fixes (commit 2) `src/server/ns_turn_maps.h`: ```diff -typedef uintptr_t ur_map_value_type; +typedef void ur_map_value_type; ... -typedef uintptr_t ur_addr_map_value_type; +typedef void ur_addr_map_value_type; ``` Why this is necessary. Both maps store pointers, but their value slot is integer-typed. Every existing `_put` site casts a pointer through `(ur__value_type)` to store, and every `_get` site casts back. Under standard C this is a well-defined no-op. Under Fil-C, casting a pointer to `uintptr_t` discards its InvisiCap; casting back yields a pointer with a non-null address but a NULL Fil-C object — the next dereference panics with `cannot read pointer with null object`. The harness caught two such panics, both in the auth-resume / relay-allocate flow: 1. `src/server/ns_turn_server.c:3248` — `ss->client_socket`, where `ss` came from `sessions_map` (a `ur_map`). 2. `src/apps/relay/turn_ports.c:225` — `tp->mutex`, where `tp` came from `ip_to_turnports_` (a `ur_addr_map`) via `turnipports_add`. Why this is also a correctness improvement on a normal build. The new typedef makes the API strictly more type-safe — the compiler now enforces "you put a pointer in." It eliminates a class of accidental misuse (storing a non-pointer integer where a pointer was expected) that the integer typedef silently allowed. Same generated code on a normal build; different (correct) Fil-C semantics. Audit. Verified before changing: - All `ur_map_put` / `lm_map_put` / `ur_addr_map_put` callers store pointer-typed values exclusively (no callers store raw integers). - No internal arithmetic on the value type anywhere in `ns_turn_maps.c`. - `ur_map_del_func` / `ur_addr_map_func` implementations either don't exist (all `_del` callers pass `NULL`) or immediately cast their parameter to a real pointer type — no source change needed. - `KHASH_MAP_INIT_INT64(3, ur_map_value_type)` works identically with `void `. - `ur_addr_map`'s `addr_elem.value` is assigned, read, compared for truthiness, and cleared with `= 0` — all valid for `void `. ## Test plan - [ ] `filc/run-local.sh` reports all six phases PASS (env / source-copy / build / unit-tests / system-cli / system-conf), `ISSUES.txt` carries no Fil-C panic / safety / sanitizer entries. - [ ] Local `cmake -S . -B build -DBUILD_TESTING=ON && cmake --build build && ctest --test-dir build --output-on-failure` is green (no regression on the regular build). - [ ] `examples/run_tests.sh` and `examples/run_tests_conf.sh` are green on Linux per `CLAUDE.md`. - [ ] Existing `fuzzing/run-local.sh ASan 0 -runs=1` still passes (the new `filc/` directory is independent and shouldn't perturb anything).	4 weeks ago
Pavel Punsky	69bc0e7351	Load generator mode in turnutils_uclient (#1894 ) ## Summary Adds load-generator modes to `turnutils_uclient` for repeatable TURN server performance testing: - Adds `-Y packet\|alloc\|invalid` load modes. - Supports packet flood, allocation flood, and invalid-packet flood workflows. - Adds unique local client ports for allocation flood mode. - Removes default packet pacing in load-generator modes unless explicitly set. - Adds helper scripts under `examples/loadtest/`. - Documents load-test usage in `README.turnutils`, `man/man1/turnutils.1`, `CLAUDE.md`, and `docs/PerformanceIterationLog.md`. The performance log captures DigitalOcean benchmark methodology, A/B lessons, hot-path findings, and future optimization candidates.	4 weeks ago
Pavel Punsky	4b97d032ad	Cache hot lookups in TURN data-path handlers (#1893 ) write_to_peerchannel(): get_relay_socket_ss() and ioa_network_buffer_get_size() were each called twice per channel-data packet. The compiler can't CSE the calls (cross-TU through a get_relay_socket() accessor in ns_turn_allocation.c that it can't prove pure), so cache the relay socket and the inbound size once. handle_turn_send(): same get_relay_socket_ss() duplication on the STUN_SEND path. read_client_connection(): the inbound size was fetched four times (received_bytes accumulator, verbose log, blen seed, ret check). Reuse ret as orig_blen. No behavior change. Targets the ~0.4% per-packet overhead these helpers were contributing in the m=1 packet-flood profile.	4 weeks ago
Pavel Punsky	62ee3759f4	Inline get_ioa_addr_len() in the header (#1891 ) This is a four-instruction accessor (read sa_family, return struct sockaddr_{in,in6} size) that gets called from every per-packet sendto(), recvmsg(), and addr-map lookup. Cross-TU it stays a real function call; moving the body into ns_turn_ioaddr.h as static inline lets each call site fold the family branch directly into the syscall setup. perf record on the m=1 packet flood (c-4 nyc1) confirms the win: - udp_recvfrom self-time: 0.76% -> 0.35% (-54%) - udp_send self-time: 0.60% -> 0.26% (-57%) End-to-end throughput stays in the run-to-run noise band, as expected for a kernel-bound workload, but the released CPU is real.	4 weeks ago
Pavel Punsky	1a53e51141	Trim two redundant checks from per-packet relay hot path (#1890 ) ioa_socket_check_bandwidth(): hoist the "no bps limit configured" fast-exit before the multi-condition socket-state check. The vast majority of sessions have max_bps == 0, so the existing path was running 5+ pointer dereferences and equality tests just to land on the same return-1. send_data_from_ioa_socket_nbh(): drop the redundant inner "if (!(s->done \|\| s->fd == -1))" gate. The outer if/else-if branch already filtered those, and ioa_socket_tobeclosed() rechecks both, so the inner test was dead code on every successful send. perf record on the c-4 nyc1 droplet (m=1 packet flood, 12s) shows send_data_from_ioa_socket_nbh self-time drop from 0.91% to 0.54% and ioa_socket_check_bandwidth fall out of the top-25 user-space symbols (was 0.33%). Throughput is within run-to-run noise — the relay is syscall-bound, so user-space wins don't translate 1:1 — but the released CPU is real.	4 weeks ago
Pavel Punsky	a619d9d6d9	Inline addr_cpy() in the header (#1892 ) Same pattern as the get_ioa_addr_len() inline: addr_cpy() is a single-memcpy helper that fires on every receive (each packet dispatch copies the source address into ioa_net_data, plus allocation/permission map-key copies). Cross-TU it stays a real function call. Combined with the previous four iterations (turn_server_get_engine hoist, bandwidth fast-exit + dead-check removal, cached relay-socket and buffer-size lookups, get_ioa_addr_len inline), the alternating A/B run on the same c-4 nyc1 droplet now shows a consistent +5% throughput on the m=1 packet flood test (recv_msgs/30s mean over 6 rounds: B=146984 / I=155468).	4 weeks ago
Pavel Punsky	23e8538657	Hoist turn_server_get_engine() out of per-packet hot path (#1889 ) turn_report_session_usage() runs on every packet but only does real reporting work once per 4096 packets. Re-order the early returns so the bitmask fast-exit fires before the cross-TU turn_server_get_engine() call, and flatten the nested if-blocks into guard clauses for readability. No behavior change. A/B testing on a c-4 nyc1 droplet shows the single-client packet-flood throughput within noise (alternating B/I rounds: B=149317 / I=153844 mean recv_msgs over 30s, ~3% in iter1's favor with ~10% run-to-run variance).	4 weeks ago
Pavel Punsky	cb701a47b4	Add fuzz coverage for integrity helpers (#1888 )	4 weeks ago
Pavel Punsky	247118d1b4	Add deterministic challenge-response builder to FuzzStun (#1886 ) stun_is_challenge_response_str in src/client/ns_turn_msg.c only descends into its three inner stun_attr_get_first_by_type_str calls when the input is an error response with err_code 401 or 438 and a REALM attribute and a NONCE attribute. The OAuth branch additionally requires STUN_ATTRIBUTE_THIRD_PARTY_AUTHORIZATION. The fuzzer-driven path in harness_attr_iter calls the predicate every iteration but the conjunction of conditions is too specific for libFuzzer to discover from binary mutation alone — OSS-Fuzz introspector flags 9 unreached callsites on stun_attr_get_first_by_type_str gated on this function. Add harness_challenge_response_builder that constructs six deterministic message variants on every iteration and runs each through the predicate: - 401 with REALM + NONCE (canonical success) - 401 with REALM + NONCE + THIRD-PARTY-AUTH (OAuth branch) - 438 with REALM + NONCE (438 disjunct) - 401 with REALM only (NONCE-missing path) - 401 with no REALM (REALM-missing path) - 400 with REALM + NONCE (wrong err_code path) Each variant runs once with a non-NULL oauth pointer and once with NULL to cover both branches of the optional output. Realm / nonce / server-name lengths and the transaction id are derived from fuzz bytes so iterations stay meaningfully distinct. Verified by stand-alone harness: - 401+REALM+NONCE returns true with attrs copied out, oauth=false - 401+REALM+NONCE+TPA returns true with oauth=true and server_name populated — confirming all three inner get_first_by_type_str callsites and the OAuth disjunct are now exercised.	1 month ago
Pavel Punsky	8952415609	Seed address-mapping table in fuzz initializer (#1885 ) map_addr_from_public_to_private and map_addr_from_private_to_public in src/client/ns_turn_ioaddr.c walk a static public_addrs[] table of size mcount. Without an explicit ioa_addr_add_mapping call mcount stays 0 for the entire fuzz process, so the loop body — including the addr_eq_no_port call it gates — is dead code in every fuzz iteration. OSS-Fuzz introspector flags this as 19 unreached callsites under map_addr_from_public_to_private and 4+4 under addr_eq_no_port. Extend the existing shared LLVMFuzzerInitialize in FuzzOpenSSLInit.c (linked into both FuzzStun and FuzzStunClient via FUZZ_COMMON_SOURCES) to register two synthetic public<->private mapping pairs — one v4 (192.0.2.1 <-> 10.0.0.1) and one v6 (2001:db8::1 <-> fd00::1) — once at startup. The header comment for ioa_addr_add_mapping requires single-threaded init before fuzzing begins, which matches exactly when LLVMFuzzerInitialize runs. Verified by stand-alone harness: after init, stun_attr_get_first_addr_str on a XOR-MAPPED-ADDRESS attribute holding 192.0.2.1:443 returns 10.0.0.1:443, and the v6 equivalent returns [fd00::1]:8080 — confirming addr_eq_no_port is now called inside the loop body in both helpers.	1 month ago
Pavel Punsky	301415d848	Unblock fuzz coverage for is_http and rare STUN attributes (#1884 ) OSS-Fuzz introspector flags three blockers the fuzzer cannot reach on its own: 1. findstr() in src/client/ns_turn_msg.c is gated by is_http(), which requires GET/POST/PUT/DELETE prefix + " HTTP/" + "\r\n\r\n". The fuzzer's binary STUN seeds never synthesize a valid HTTP frame. 2. stun_attr_get_reservation_token_value() and stun_attr_get_response_port_str() are called from harness_attr_iter only when the input contains the matching attribute type. Neither appears in the existing seed corpus. Add HTTP framing keywords to fuzzing/stun.dict and four new seed files covering both gaps: - seed_http_get.raw: minimal "GET / HTTP/1.1\r\nHost: x\r\n\r\n" - seed_http_post_clen.raw: POST with Content-Length to drive the strtoul branch in is_http - seed_reservation_token.raw: STUN allocate response with an 8-byte RESERVATION-TOKEN attribute - seed_response_port.raw: STUN binding request with a 4-byte RESPONSE-PORT attribute Each new STUN seed validated against the real parsers (stun_get_message_len_str, stun_attr_get_first_by_type_str, is_http) to confirm it reaches the targeted branch. The corpus zips also drop pre-existing __MACOSX/ and .DS_Store entries that had snuck in during a prior macOS zip step; net file count rises (24 -> 28 in FuzzStun, 4 -> 8 in FuzzStunClient) while archive size shrinks because of the junk removal.	1 month ago
Pavel Punsky	301d12fdda	HTTP parsing fixes (#1882 )	1 month ago
Pavel Punsky	b4c138c409	Cover all public stun_buffer.c wrappers in FuzzStunClient (#1883 ) Add harness_stun_buffer_api to FuzzStunClient.c that exercises every public wrapper in src/apps/common/stun_buffer.c not already reached by the existing harnesses: stun_get_size (NULL/non-NULL), the init_request / init_indication / init_success_response builders, the tid accessors, the stun_is_indication wrapper (which gates the static is_channel_msg), the attr_add / attr_add_channel_number / attr_add_addr / attr_add_even_port (both branches) / attr_get_first_by_type accessors, stun_set_allocate_request (rt NULL and non-NULL paths), stun_set_binding_request / stun_prepare_binding_request, and the channel-message wrappers. Each builder call is followed by inspect_buffer_message so the resulting serialized message is also walked by the parser predicates. A tail block also pumps raw fuzzer bytes through the wrapper-form predicates (stun_is_indication, stun_is_channel_message, stun_tid_from_message, stun_attr_get_first_by_type) so they see malformed inputs the serializer paths cannot produce.	1 month ago
Pavel Punsky	46e5117fb1	Extend fuzzing coverage and enable local fuzzing in a container (#1881 )	1 month ago
Pavel Punsky	741b2983cc	Extend STUN client fuzz builder coverage (#1878 )	1 month ago
Pavel Punsky	4ffa60d32e	Out of bound HTTP detection in parser (#1877 )	1 month ago
Pavel Punsky	51520c77a2	Delete log line per relay thread on start (#1876 ) This is a unique log per relay thread. Auth threads do not log.	1 month ago
Pavel Punsky	453afd1fdc	Add Unity-based unit test scaffolding (#1875 ) ## Summary Introduces an opt-in unit test layer for coturn using [Unity](https://github.com/ThrowTheSwitch/Unity) — a single-header pure-C test framework that matches coturn's C11 toolchain, portability bar, and zero-C++ production tree. - Unity v2.6.0 is fetched on demand via CMake `FetchContent` (nothing vendored). - Tests are gated behind `-DBUILD_TESTING=ON` (off by default), so the standard build and OSS-Fuzz pipeline are unaffected. - Two test binaries cover pure C-callable code in `libturnclient`: - `test_ioaddr` (6 cases) — `make_ioa_addr`, `addr_get_port`/`addr_set_port`, `addr_eq` variants, `addr_to_string`, IPv4/IPv6/garbage input - `test_stun_msg` (7 cases) — STUN header construction, request/indication/success/error response classification, transaction-ID round-trip, channel message parsing, truncated/zeroed buffer rejection - New `check` cmake target builds tests before running ctest (avoids the `make test` footgun where the auto-generated `test` target only runs already-built binaries). - Legacy `Makefile.in` gets a `unit-tests` target that bootstraps `build/unit-tests/` and delegates to the cmake `check` target. `make check` and `make test` now run the RFC 5769 conformance suite plus the Unity unit tests. - CLAUDE.md documents the new workflow plus the one-liner for adding a new `test_<name>.c`. ## Why The existing test story is shell-script integration suites under `examples/scripts/` — they exercise the binary end-to-end but can't pin down behavior of individual functions, can't run without a full build environment, and don't fail loudly when a unit-level invariant breaks. A lightweight unit layer gives us: - Targeted regression coverage for protocol parsing/encoding (the highest bug-yield area). - A natural home for tests of the kinds of subtle invariants already documented in CLAUDE.md (port-counter overflow safety, port-bounds inclusivity, HMAC buffer initialization). - Sub-second feedback for contributors. ## Usage ```bash # CMake direct cmake -S . -B build -DBUILD_TESTING=ON cmake --build build -j --target check # build + run all unit tests ctest --test-dir build --output-on-failure # run already-built tests # Legacy Makefile bridge (after ./configure) make unit-tests # bootstraps build/unit-tests/, builds + runs Unity tests make check # RFC 5769 conformance + unit tests ``` Adding a new test: 1. Drop `tests/test_<name>.c` 2. Append `coturn_add_test(test_<name>)` in `tests/CMakeLists.txt` 3. The `check` target picks it up automatically. ## Test plan - [x] Clean cmake build with `-DBUILD_TESTING=ON` succeeds; full source tree (turnserver, turnadmin, turnclient, turn_server, all turnutils) still builds - [x] `cmake --build build --target check` builds and runs both test binaries — 13/13 cases pass - [x] `ctest --verbose` shows per-case PASS lines for all 13 cases - [x] Default build (`-DBUILD_TESTING` unset) does not fetch Unity or build any test binary ## Notes for reviewers - Why Unity over GoogleTest/Catch2: pure C, single source file, no C++ toolchain dependency, runs anywhere coturn does (incl. exotic CMake targets like Solaris/AIX). GoogleTest would force `extern "C"` wrappers and a C++ compiler everywhere.	1 month ago
Pavel Punsky	c1518d5f2a	Drop udp_relay_servers_number config and clean up dead UDP id-space (#1874 ) ## Summary - Remove the `udp-relay-servers` config knob and the `udp_relay_servers_number` field — after #1849 left only the `PER_THREAD` UDP engine, this knob is no longer wired to anything that creates extra UDP relay threads. - Delete the now-orphaned `udp_relay_servers[]` array, the `TURNSERVER_ID_BOUNDARY_BETWEEN_TCP_AND_UDP` / `..._UDP_AND_TCP` macros, and the `id >= boundary` branch in `get_relay_server()`. The array was read but never written anywhere in the tree, so the branch was unreachable dead code. - Drop a stray unused `char s[257]` in `dbd_redis.c::redis_list_secrets`. - Adjust the startup banner to log "Total relay threads" (was "Total General servers" + a never-fired "Total UDP servers" block). ## Test plan - [x] `cmake .. && make -j8` clean build - [x] `examples/scripts/rfc5769.sh` — all RFC 5769 conformance vectors pass - [x] `examples/scripts/basic/relay.sh` + `udp_c2c_client.sh` — 12000/12000 msgs, 0 lost, 0 dropped	1 month ago
Pavel Punsky	c8b3dd6513	Merge 10 fuzz targets into FuzzStun and FuzzStunClient via dispatcher (#1873 ) Upstream OSS-Fuzz build recipe (google/oss-fuzz/projects/coturn/build.sh) only copies two fuzzer binaries -- FuzzStun and FuzzStunClient -- and their seed corpora into $OUT. The eight additional fuzz targets added later never ran on oss-fuzz.com, which is why the introspector profile reports "fuzzer no longer available" for them. Rather than patching the Google-owned build recipe, fold all fuzzers into the two binaries OSS-Fuzz actually ships. Each target now begins with a single-byte selector (Data[0] mod 5) that dispatches to one of five sub-harnesses: FuzzStun - integrity (SHA1/multi-SHA), attr_iter, attr_add, old_stun FuzzStunClient - stun_client, channel_data, addr_codec, oauth_token, oauth_roundtrip No upstream OSS-Fuzz changes are required.	1 month ago
Pavel Punsky	4f8385e142	Fix build failure: define _GNU_SOURCE for recvmmsg() on Linux (#1868 ) ## Summary - Fixes compilation error on Linux when `_GNU_SOURCE` is not defined by the toolchain: `struct mmsghdr` has incomplete type and `recvmmsg()` is implicitly declared - Defines `_GNU_SOURCE` in three places for full coverage across build systems: - `dtls_listener.c` — before includes, guarded by `#if defined(__linux__)` - `configure` — adds `-D_GNU_SOURCE` to `OSCFLAGS` on Linux for the legacy build path - `CMakeLists.txt` — adds `-D_GNU_SOURCE` on Linux for the CMake build ## Context The `recvmmsg()` batched receive path added in #1852 uses `struct mmsghdr` and `recvmmsg()`, which are glibc extensions requiring `_GNU_SOURCE`. Some Linux distros/toolchains don't define this implicitly, causing: ``` src/apps/relay/dtls_listener.c:129:18: error: array type has incomplete element type 'struct mmsghdr' src/apps/relay/dtls_listener.c:748:18: warning: implicit declaration of function 'recvmmsg' ``` Fixes #1867 ## Test plan - [x] Verified CMake build succeeds on macOS (recvmmsg code is `#if defined(__linux__)` guarded — no effect on non-Linux) - [x] Verify build succeeds on Linux with and without `_GNU_SOURCE` in the environment - [x] Verify both `cmake` and `./configure && make` build paths work Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	1 month ago
Pavel Punsky	c37ccf4df9	Pin session origin only after MESSAGE-INTEGRITY validates (#1871 ) The first ALLOCATE set ss->origin_set=1 before check_stun_auth ran, so an unauthenticated attacker could lock the session into a realm of their choice by forging the ORIGIN attribute on the first packet. If per-realm ACLs differ, this lets the attacker pick the most permissive realm for that session. Defer the commit of ss->origin_set until check_stun_auth succeeds with a valid MESSAGE-INTEGRITY. Until auth passes, every request re-parses ORIGIN, so the 401 challenge still carries the correct realm derived from the current ORIGIN attribute.	1 month ago

1 2 3 4 5 ...

1946 Commits (master) All Branches Search

1946 Commits (master)

All Branches