## Summary
Flips the Linux default for `--udp-recvmmsg` from **off** to **on**.
Operators opt out with `--udp-recvmmsg=false` (or `=0`).
> **Stacked on #1929.** This depends on the recvmmsg-scoping change in
#1929 and is based on that branch, so the diff shows only the default-on
change. GitHub will auto-retarget the base to `master` once #1929
merges. Merge #1929 first.
## Why this is now safe
The original objection to default-on (recorded in
`docs/PerformanceIterationLog.md`) was the **per-session-relay-socket
prealloc tax**: `--udp-recvmmsg` applied the 16-buffer batch path to
every connected relay socket, which only ever carries one flow, so the
churn ate the listener-side win.
#1929 scoped recvmmsg to **shared fan-in sockets only**
(`udp_recvmmsg_eligible`: the client listener, plus the per-thread
shared relay socket under `--multiplex-peer`). Per-session relay sockets
now stay on the single-recv path regardless of the flag, so that tax is
gone. The one socket touched by default — the client listener — is a
genuine fan-in point:
- batches whenever client concurrency is non-trivial (measured
`avg_batch ≈ 16` under load), and
- costs little when idle (few packets ⇒ few prealloc cycles).
## What changed
- `mainrelay.c`: `turn_params.udp_recvmmsg` default `false → true`
(Linux only).
- Removed the now-dead `--multiplex-peer` auto-enable block and the
`udp_recvmmsg_set_explicitly` tracking it relied on; multiplex-peer gets
its recvmmsg window from the default. The opt-out flows through the
normal `get_bool_value` path.
- Help text, `man/man1/turnserver.1`, `examples/etc/turnserver.conf`,
`CLAUDE.md`, and `docs/PerformanceIterationLog.md` updated for the new
default + opt-out.
Per-session relay sockets and DTLS session sockets are unchanged.
## Validation
- **Format:** clang-format 15.0.7 clean.
- **macOS:** build + ctest 6/6 + `run_tests.sh` pass.
- **Linux (Docker, clean build):** ctest 5/5; `run_tests.sh`,
`run_tests_conf.sh`, `run_tests_multiplex_peer.sh` all pass (no FAIL).
- **Runtime proof (loopback, `--udp-recvmmsg-log`):**
- Default, no flag: recvmmsg active, `calls=13714 packets=219306
avg_batch=15.99`.
- `--udp-recvmmsg=false`: zero recvmmsg activity — opt-out confirmed.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
## Summary
Each TURN permission and channel carried its own libevent timer, torn
down and recreated (`IOA_EVENT_DEL` + `set_ioa_timer`) on **every**
refresh. At high allocation counts this is the dominant scheduling and
memory cost: ~200k+ live libevent events at 100k allocations (≥2 per
allocation), each ~228 B of resident heap (a `timer_event`, a libevent
`struct event`, and a `strdup` of the handler name) plus a min-heap
node.
This replaces per-object timers with a single **per-thread sweep**.
`timer_timeout_handler` already runs once a second per relay thread on
that thread's own engine; after refreshing `ctime` it now walks the
thread-local `sessions_map` and reaps permissions/channels whose
`expiration_time` has passed, **reusing the existing
`client_ss_perm_timeout_handler` / `client_ss_channel_timeout_handler`**
so teardown (including `mp_deregister_permission_peers` in
multiplex-peer mode) is unchanged. `update_turn_permission_lifetime` /
`update_channel_lifetime` now just push `expiration_time` forward.
The now-dead `lifetime_ev` fields are removed from
`turn_permission_info` (648→640 B) and `ch_info` (64→56 B), along with
the spurious "strange permission" error log that would otherwise fire on
every sweep-driven expiry.
## Why
For workloads with many concurrent allocations (e.g. 100k VoIP
sessions), the per-object timer model puts ~200k+ nodes in libevent's
min-heap and burns a free/malloc/strdup cycle on every refresh. The
expiry deadline is already stored in `expiration_time`; a periodic sweep
makes the timer redundant.
## Trade-offs
- Expiry latency rises to **≤1s** (negligible vs 300–600s
permission/channel TTLs).
- The sweep is a bounded per-thread array walk — ~39k trivial
comparisons/thread/s at 100k allocations over 128 threads.
## Performance (measured)
Faithful libevent 2.1.12 microbenchmark of the create/refresh/destroy
path:
| Metric | Before | After |
|---|---|---|
| Resident heap per timer | ~228 B | 0 |
| Refresh op | 110 ns (del+create at 100k heap depth) | 0.28 ns (field
store) |
| `sizeof(turn_permission_info)` | 648 B | 640 B |
| `sizeof(ch_info)` | 64 B | 56 B |
At 100k allocations (≥2 timers each): **~46 MB** of libevent objects
freed and ~200k fewer min-heap nodes.
## Testing
- Unit (`ctest`), `run_tests.sh`, `run_tests_conf.sh`,
`run_tests_multiplex_peer.sh` — pass on macOS and in a clean Ubuntu
24.04 Docker build.
- New `examples/run_tests_expiry.sh`: forces 4s server-side
permission/channel lifetimes so the sweep must reap mid-session, and
asserts the verbose log shows reaping while the server stays healthy.
**Verified to FAIL when the sweep call is disabled** (negative control),
confirming it catches the regression rather than passing trivially.
## Summary
Adds **`--multiplex-peer`**, a non-standard relay mode that replaces the
per-allocation peer-side port bind with **one shared IPv4+IPv6 UDP
socket pair per relay thread**. Sessions are demultiplexed by exact peer
IP:port in a per-thread `mp_table`. This lifts the ~16 k allocation cap
that the default 49152-65535 relay port range imposes, and dramatically
reduces kernel-level UDP receive-buffer drops under high pps.
Design and trade-offs: [docs/multiplex-peer.md](docs/multiplex-peer.md).
## What changes
### Server (turnserver)
- **`--multiplex-peer`** (cross-platform) — enable the shared per-thread
relay sockets. Replaces the per-session port bind. Implies sendmmsg
batching on Linux and default-enables `--udp-recvmmsg` (override with
`--udp-recvmmsg=0`). Incompatible with EVEN-PORT — those Allocates are
rejected with 400.
- **`--multiplex-peer-port <port>`** (cross-platform, default 3480) —
base port; thread `i` binds `<base>+2i` (IPv4) and `<base>+2i+1` (IPv6).
A 4-thread server consumes 8 ports.
- **`--udp-gso`** (Linux-only CLI) — UDP-GSO (`UDP_SEGMENT` cmsg) on the
relay send path. Requires `--multiplex-peer` (which is what enables the
sendmmsg batching GSO piggybacks on); passing `--udp-gso` alone is a
silent no-op.
- **CLI surface tightened**: `--udp-recvmmsg`, `--udp-recvmmsg-log`,
`--udp-gso` and their fields are now `#if defined(__linux__)` — absent
from `--help`, rejected with `unrecognized option`, and the code paths
compile out on macOS/Windows.
- **Windows portability**: `SO_REUSEPORT` in `mp_open_socket` wrapped in
`#ifdef` (MSVC's Winsock doesn't define it; REUSEPORT was defensive
anyway because the per-thread port layout is unique by construction).
- **`--sock-buf-size` honoured at startup**: the shared multiplex-peer
relay socket now calls `set_ioa_socket_buf_size` in `mp_open_socket` so
the configured rcvbuf is in effect from the moment the socket exists,
not deferred to the first Allocate.
### turnutils_uclient (loadgen)
- **`--no-even-port`** — force `ep = -1` on Allocate. The default path
randomly attaches EVEN-PORT (with no-R bit) even under `-c`, which
`--multiplex-peer` strictly rejects with 400; this flag makes
alloc-flood runs against multiplex-peer deterministic.
- **Legacy `timer_handler` now wraps the per-tick send batch with
`uclient_send_batch_begin/_end`** — without this, runs with
`--sender-threads 0` (the default for `-m < 4`) silently fell through
every send to plain `send(2)`. strace A/B: 205 k `sendto` → 61 k
`sendmsg` (GSO) + 4 k `sendmmsg` + small `sendto` residual for control.
## Measured impact (3-droplet DigitalOcean, c-4 / 4 vCPU, 8 concurrent
UDP streams, 45 s)
| | baseline | `--udp-recvmmsg` | `--multiplex-peer` | `--multiplex-peer
--udp-gso` |
|---|---:|---:|---:|---:|
| Server NIC rx pps (UDP relay both legs) | 350 k | 334 k | 326 k | 294
k |
| Server `UdpInDatagrams` pps | 279 k | 292 k | 300 k | 294 k |
| **Server `UdpRcvbufErrors` pps** | **71 k** | 42 k | 26 k | **0.3 k
(−99.6 %)** |
| **`turnserver` process CPU** | **387 %** | 205 % | 283 % | **133 %
(−65 %)** |
| Server host idle | 22 % | 49 % | 41 % | **68 %** |
Same loadgen-side packet rate (~2 M pps reported by uclient `send_pps`
after the legacy-path batching fix). Iteration log:
[docs/PerformanceIterationLog.md](docs/PerformanceIterationLog.md).
## Test plan
- [x] `ctest --test-dir build` — 3/3 pass (test_ioaddr, test_stun_msg,
test_http_server) on macOS + Linux.
- [x] `examples/run_tests.sh` — 4 protocols + 4 threaded + load-gen
smoke on Linux; 4 protocols on macOS.
- [x] `examples/run_tests_conf.sh` — same coverage, conf-driven.
- [x] `examples/run_tests_multiplex_peer.sh` — UDP/TCP/TLS/DTLS via
`--multiplex-peer --multiplex-peer-port=35000` on macOS + Linux.
- [x] Flag matrix smoke on macOS: `--multiplex-peer`,
`--multiplex-peer-port=42000`, `--multiplex-peer --udp-gso` (no-op),
`uclient --no-even-port`, `uclient --listener-threads N --sender-threads
M` — all pass; `--udp-recvmmsg` / `--udp-gso` correctly rejected with
`unrecognized option`.
- [x] Flag matrix smoke on Linux (Docker): same + `--udp-recvmmsg`
accepted, `--multiplex-peer` auto-enables `--udp-recvmmsg`,
`--udp-recvmmsg=0` overrides the auto-enable.
- [x] Windows compile fix verified — `SO_REUSEPORT` no longer referenced
unconditionally.
- [x] 3-droplet perf matrix completed; per-hop UDP counters captured.
## Docs updated
- New: [docs/multiplex-peer.md](docs/multiplex-peer.md)
- [README.turnserver](README.turnserver): full entries for
`--multiplex-peer`, `--multiplex-peer-port`, `--udp-gso`; clarified
`--udp-recvmmsg` auto-enable semantics.
- [README.turnutils](README.turnutils): added `--no-even-port`, plus
previously-undocumented `--listener-threads` / `--sender-threads`
loadgen pool flags.
- [examples/etc/turnserver.conf](examples/etc/turnserver.conf):
commented `udp-recvmmsg`, `udp-recvmmsg-log`, `udp-gso`,
`multiplex-peer`, `multiplex-peer-port` keys with one-paragraph
descriptions and pointer to `docs/multiplex-peer.md`.
- Man pages regenerated via `./make-man.sh`.
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
## Summary
`examples/run_tests.sh` and `examples/run_tests_conf.sh` previously
covered only the legacy single-threaded uclient path. After recent PRs
they were no longer exercising:
- `--listener-threads` from #1911 (recv thread pool)
- `--sender-threads` from #1913 (send thread pool)
- `--udp-gso` from #1907 (server-side GSO send)
- the `recv_pps` metric introduced in #1913
A regression in any of those would have shipped silently. This PR
expands both scripts so every CI cycle hits all four features
end-to-end.
## What changes
**Per-protocol coverage doubled.** The four protocol tests (TCP / TLS /
UDP / DTLS) are factored into a shell function and each runs twice:
1. Default uclient flags — legacy single-thread paths.
2. `--listener-threads 1 --sender-threads 1` — engages both pools at
minimum non-zero size, so the slab counters, per-thread libevent base,
and pick_listener_base / pick_sender_id paths all fire.
The `grep "tot_send_bytes ~ 1000, tot_recv_bytes ~ 1000"` target is
stable across both variants because the workload (`-m 1 -n default=5 -l
200`) totals 1000 bytes either way — threading doesn't change the byte
count, only the machinery that produces it.
**Server-side GSO enabled on Linux.** `run_tests.sh` adds `--udp-gso`
next to the existing `--udp-recvmmsg`. `run_tests_conf.sh` writes both
keys into the generated `turnserver.conf` on Linux. The conf-file parser
uses the same long-options table as the CLI (see `mainrelay.c`
`long_options`), so the keys map 1:1.
**Load-gen smoke on Linux.** A short `-Y packet -m 4 -l 100 -c` run with
`--sender-threads 2` ends each script:
```bash
timeout -s INT 6s turnutils_uclient \
-Y packet -m 4 -l 100 -c -e 127.0.0.1 -g \
--listener-threads 1 --sender-threads 2 \
-u user -W secret 127.0.0.1
```
The grep asserts the progress print contains a **non-zero** `send_pps`
AND a **non-zero** `recv_pps`. Non-zero `recv_pps` is the canonical
signal that the listener-pool slab reduction (`recv_count_snapshot`) is
feeding back into the load-rate reporter — the codepath most likely to
silently drop output if the thread pool's stop ordering or slab plumbing
regresses.
`timeout -s INT 6s` triggers a clean SIGINT-driven exit; exit codes 0,
124, and 130 all count as success because we just want a fixed-duration
run.
## Test plan
Linux Docker (authoritative):
```
=== run_tests.sh ===
Using TURNSERVER_EXTRA_ARGS="--udp-recvmmsg --udp-gso"
Running turn client TCP OK
Running turn client TLS OK
Running turn client UDP OK
Running turn client DTLS OK
Running turn client TCP (threaded) OK
Running turn client TLS (threaded) OK
Running turn client UDP (threaded) OK
Running turn client DTLS (threaded) OK
Running turn client UDP load-gen smoke OK
=== run_tests_conf.sh ===
Running turn client TCP OK
Running turn client TLS OK
Running turn client UDP OK
Running turn client DTLS OK
Running turn client TCP (threaded) OK
Running turn client TLS (threaded) OK
Running turn client UDP (threaded) OK
Running turn client DTLS (threaded) OK
Running turn client UDP load-gen smoke OK
```
9/9 in each. macOS local TCP test fails the same way it did before this
change (pre-existing flake on Darwin loopback TCP, unrelated to
threading or GSO).
- [x] Both scripts run end-to-end in Linux Docker against a fresh build.
- [x] `--help` flag list verified: `--listener-threads`,
`--sender-threads`, `-Y` all present.
- [x] No new dependencies; only uses `timeout` (already in coreutils on
Linux base images).
## Summary
This change lets the listener batch incoming UDP datagrams when
`--udp-recvmmsg` is enabled, reducing per-packet overhead on busy
listeners while preserving the existing behavior as the default and
fallback path.
## What changed
- add a new `--udp-recvmmsg` runtime flag
- implement a batched UDP receive path in the DTLS listener using
`recvmmsg()`
- reuse packet classification and datagram processing logic across
batched and non-batched receive paths
- reduce buffer/metadata churn by reusing listener-side scratch state
and network buffers
- keep compatibility safeguards by falling back when `recvmmsg()` is
unavailable or unsupported
- expose the setting in admin/CLI configuration output
- update the example test runner to enable the flag on Linux
## Why
The current listener processes UDP datagrams one at a time. On Linux,
`recvmmsg()` allows the server to receive multiple packets per syscall,
which should improve throughput and lower CPU overhead under load for
UDP-heavy traffic.
## Notes
- the feature is opt-in and defaults to disabled
- the implementation is Linux-specific and leaves the existing path
unchanged on other platforms
- the listener still falls back to the legacy receive path if batched
receive is unavailable at runtime
## Testing
- updated `examples/run_tests.sh` to pass `--udp-recvmmsg` on Linux
- validated behavior through the existing listener flow and fallback
handling
Remove the two engine implementations (NEV_UDP_SOCKET_PER_SESSION and
NEV_UDP_SOCKET_PER_ENDPOINT) and all the dispatch/selection logic around
them. NEV_UDP_SOCKET_PER_THREAD is now the sole, unconditional
implementation.
- mainrelay.h: removed _NET_ENG_VERSION enum, typedef, and
net_engine_version / net_engine_version_txt struct fields
- mainrelay.c: removed NE_TYPE_OPT CLI option, set_network_engine(),
per-endpoint branch in print_features(), and all remaining
net_engine_version references
- netengine.c: removed run_udp_listener_thread(),
setup_socket_per_endpoint_udp_listener_servers() (~190 lines),
setup_socket_per_session_udp_listener_servers() (~90 lines); simplified
setup_barriers(), setup_relay_server(), run_general_relay_thread(),
setup_general_relay_servers(), and setup_server() by eliminating all
engine-type conditionals
- turn_admin_server.c: replaced dynamic engine version lookups with
hardcoded values (3 / "UDP thread per CPU core") in CLI and HTTPS status
handlers
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Small fixes across CI workflows and test scripts:
- In examples/run_tests.sh & examples/run_tests_conf.sh: ensure both
turnserver and the turnutils_peer background process are killed at the
end
- cmake.yml so binaries end up in the expected folder
- linux.yml - add install so that binaries are in the expected folder
cli interface is ON by default which creates a security risk (even
though requires a password) and recommended to be disabled.
Instead of just recommendation, this PR disables CLI by default and now
requires an explicit flag to enable it
If using old configuration or cli arguments to turnserver - it will log
an error message about `--no-cli` being deprecated while doing nothing
(already disabled). This log line will be removed in the future
Disable the messages by default - they can be re-enabled using
`--include-reason-string` option
As a result of not sending reason string (which is optional by standard
and provide debugging information for the actual numeric error code)
response message size can be decreased by up to NNN bytes.
Some places in code do not have access to the buffer size which result
in crash which can be seen in tests
This PR removes the call to `set_ioa_socket_buf_size` from those places
(which is redundant anyway)
# Description
Replace the hardcoded buffer sizes inside coturn to make them
configurable for different use cases (low bitrate use cases can save
memory and high bitrate use case can avoid congestion) - based on #1089
Add this feature in both sides (listener and relay connections).
# Tests
For now it is only the automated CI tests.
Confirmed with debugger that buffer sizes are set according to the
arguments.
This PR adds a new `--cpus` configuration option to address CPU
detection issues in virtualized and containerized environments where
`_SC_NPROCESSORS_CONF` and `_SC_NPROCESSORS_ONLN` return host CPU counts
instead of allocated container CPUs.
## Problem
In containerized deployments, coturn detects the host's CPU count (e.g.,
128 CPUs) instead of the container's allocated CPUs (e.g., 2 CPUs). This
causes the server to create excessive relay threads and database
connections, leading to resource exhaustion and performance issues.
## Solution
Added a new `cpus` configuration option that allows manual override of
CPU detection:
### Command Line Usage
```bash
turnserver --cpus 2
```
### Configuration File Usage
```ini
# Override system CPU count detection for containers
cpus=2
```
## Key Features
- **Backward Compatible**: No changes needed for existing deployments
- **Input Validation**: Values must be between 1 and 128 with proper
error handling
- **Comprehensive Documentation**: Updated man pages and example config
files
- **Both Interfaces**: Works via command line and configuration file
## Testing
The implementation has been thoroughly tested:
```bash
# Container with 2 allocated CPUs on 128-CPU host
$ turnserver --cpus 2
INFO: System cpu num is 128 # Host detection
INFO: System enable num is 128 # Host detection
INFO: Configured cpu num is 2 # Override applied
INFO: Total General servers: 2 # Correct thread count
```
- ✅ Command line option: `--cpus 8` creates 8 relay servers
- ✅ Config file option: `cpus=6` creates 6 relay servers
- ✅ Error handling: Invalid values show appropriate errors
- ✅ Default behavior: Without option, uses system detection
- ✅ RFC5769 tests: All protocol tests still pass
## Files Modified
- `src/apps/relay/mainrelay.c` - Core implementation
- `src/apps/relay/mainrelay.h` - Added configuration flag
- `examples/etc/turnserver.conf` - Added documentation and example
- `man/man1/turnserver.1` - Updated man page
This change directly addresses the resource consumption issues in
containerized environments while maintaining full backward
compatibility.
Fixes#1628.
Set new release version to 4.7.0
Updating minor version due to some breaking changes in options to enable
more secure/robust configuration without additional flags (or relying on
recommended conf file which people seem to skip during updates)
TLSv1 and TLSv1.1 can be enabled using `--tlsv1` and `--tlsv1_1`
arguments accordingly
That assumes openssl version being used has these versions enabled
(which as of openssl-3.5 is not by default)
Deprecate `--no-stun-backward-compatibility` and set it to true by
default
Add new option `--stun-backward-compatibility`, off by default
Update example/recommended configuration files
This is a breaking change as passing `--no-stun-backward-compatibility`
will be rejected as invalid argument
Invert `--no-rfc5780` option to be true by default
Make it `--rfc5780` to enable it
Update example/recommended configuration files
Passing `--no-rfc5780` will have no effect as this is the default
behavior now
The `#allocation-default-address-family="ipv4"` line is repeated twice
in the example config, changed the second one to be `"ipv6"` which I
assume it was intended to be.
Add a `--prometheus-path` parameter which allows users to specify at
what
path the metrics should be exposed.
This simplifies serving metrics on a specific path behind some
restrictive reverse proxies that expect the upstream server to serve
URLs with paths matching the requested path.
Co-authored-by: Pavel Punsky <eakraly@users.noreply.github.com>
Some actions do not build with prometheus - adding prometheus tests
fails the jobs
cmake build tests did not run due to different target folder (while
reporting success) - now the bin folder is detected
Implement a custom prometheus http handler in order to:
1. Support listening on a specified address as opposed to any
2. Remove the requirement on the unmaintained promhttp library
This feature comes with one limitation: if an IPv4 address is used, the
server will not listen on the IPv6-mapped address, even if IPv6 is
available. That is, dual-stacking does not work.
Solves: #1475
---------
Co-authored-by: Pavel Punsky <eakraly@users.noreply.github.com>
I believe that many users, like myself, prefer to reference the
`turn.conf` file when deploying the TURN server with Docker, rather than
the `Readme.turnserver`. Additionally, I think it's important to
synchronize the Prometheus settings from the README into the` turn.conf`
file for better clarity. This way, users won't overlook any essential
options.
Co-authored-by: Ben Chang <ben_chang@htc.com>
Fixes#1533 and #1534
Memsetting `turn_params.default_users_db` before reading conf file, not
after.
Because auth is read in first iteration so secret was wiped out.
# test plan
Add new test script that uses config file to setup turnserver instead of
cli arguments and confirm it works (fails without the change)
For our deployment, it is useful if coturn returns a valid HTTP response to an HTTP request. To do this on the same port as STUN/TURN and without enabling the admin site, I have extended `read_client_connection()` to return a canned HTTP response, in response to an HTTP request, rather than immediately closing the connection.
Fixes https://github.com/coturn/coturn/issues/1239
https to web ui freeze in browser if no_tls option used, because no tls
stuff initialized.
This PR add warning about this and comment aboute this in default config
Similar to #989, use a single SSL context for all versions of DTLS
protocol
- Add support for modern API (protocol version independent APIs)
- Add DTLS test to the CI test
- Removing calls to `SSL_CTX_set_read_ahead` in DTLS context (does
nothing as DTLS is datagram protocol - we always get the whole datagram
so this call has no impact)
Fixes#924
openssl allows multiple TLS version support through a single SSL_CTX
object.
This PR replaces 4 per-version SSL_CTX objects with a single object
(DTLS is not yet changed).
SSL context initialization code for openssl with modern API (>=1.1.0)
uses `TLS_server_method` and `SSL_CTX_set_min_proto_version` instead of
enabling specific TLS version. Byproduct of this is TLSv1_3 support when
used with openssl-1.1.1 and above
TLS 1.2 and TLS 1.3 cannot be disabled (as before)
Test plan:
- run_tests.sh script now runs turnserver with SSL certificate (which
enables TLS support)
- run_tests.sh now has one more basic test that uses TLS protocol
Co-authored-by: Pavel Punsky <pavel.punsky@epicgames.com>