feat: optional kernel SYN rate-limiter + per-endpoint DC connect timeout by sleep3r · Pull Request #363 · sleep3r/mtproto.zig

sleep3r · 2026-06-15T13:12:10Z

Two improvements drawn from analysing MTproxy-reanimation (a tuning wrapper for Telemt/MTProxyMax). That project is not a proxy fork — its value is OS/network-level techniques applied around a proxy. Of its five techniques, only two are worth porting; the other three we already do better or they conflict with our invariants (see "Not ported" below). Each kept technique was adversarially verified against our actual code.

1. `mtbuddy setup syn-limit` — optional kernel SYN rate-limiter (T1)

A default-OFF, per-source-IP inbound SYN rate-limiter on the proxy port via iptables -m hashlimit, run as a separate systemd oneshot (mtproto-syn-limit.service).

Why it's not redundant with our in-proxy guards: the handshake flood guard and per-/24 subnet limiter both run after accept() and both default OFF (NAT/VPN false-positives). So today every abusive SYN still costs a kernel socket + an accept() syscall. A kernel hashlimit drops excess SYNs pre-accept — a different layer, complementary not duplicate.

Design choices

iptables hashlimit, not nftables — matches our existing stack (TCPMSS, nfqws); no new dependency.
Separate oneshot unit so CAP_NET_ADMIN never has to be granted to mtproto-proxy.
Default OFF behind a loud CGNAT/VPN warning (same shared-egress-IP problem that keeps the in-proxy guards off). Warns when accept_proxy_protocol is set (kernel sees the LB IP, not clients).
Presets mirror the source: soft 2/s burst 5 (default when enabling, CGNAT-safer), medium 1/s burst 3, hard 1/s burst 1.
Drop counter in mtbuddy status; verifies the rule actually landed (xt_hashlimit present) instead of a silent no-op.
Idempotent apply + full uninstall cleanup (replay-delete INPUT jumps → flush → delete chain, v4 & v6). Generated script is brace-free (so std.fmt renders it) and rate/burst/port are validated before being baked in.

mtbuddy setup syn-limit --preset soft        # enable (CGNAT-safer default)
mtbuddy setup syn-limit --rate 1/second --burst 3
mtbuddy setup syn-limit --remove
mtbuddy setup syn-limit --status

2. `dc_connect_timeout_sec` — per-endpoint DC connect timeout (T2)

Default 10s. A filtered/black-holed DC endpoint sends no RST, so the kernel sits in SYN_SENT ~2 min. handshake_timeout_sec (15s from the client's first byte) already caps the whole handshake — so we are not exposed to an unbounded hang — but that cap is global, so a slow first endpoint starves the failover budget for the rest. This fails one dead endpoint fast and lets failover advance to the next, within the overall handshake ceiling.

Mirrors onUpstreamConnectComplete's failure path exactly (cleanupFailedUpstreamConnect → tryNextDcEndpoint for .dc, else closeSlot), driven from the timer tick. The deadline base is stamped per attempt and gated on phase == .connecting_upstream, so it can never touch an established relay; a healthy connect finishes in <1s, so working endpoints are unaffected. Deliberately does not raise handshake_timeout_sec (that would widen the active-probe/slow-loris window our DD-decision + flood guards exist to shrink).

Not ported (verified against our code)

iOS keepalive sysctl (60/15/3): no-op for us — relay sockets already set SO_KEEPALIVE 60/10/3 + TCP_USER_TIMEOUT=30s (stricter), and a host-wide sysctl can't touch the pre-handshake sockets that lack SO_KEEPALIVE (already reaped by our idle/handshake timeouts).
iOS MSS=92 + separate port DNAT: a second port = a second tg:// link (our links are immutable once distributed — hard no); the MSS half overlaps/conflicts with our existing TCPMSS=88; and it's a mechanism-free folk remedy for the iOS resume hang we already root-caused to the client MtProtoKit bad_server_salt bug (shipped client_silence_close_sec, filed Telegram-iOS#2197).
Deployment autodetection + systemd persistence: already implemented for our own stack.

Testing

zig build -Doptimize=ReleaseFast -Dtarget=x86_64-linux -Dcpu=x86_64_v3+aes ✅
zig build test ✅ (new unit tests: SYN-limit preset/rate/number validation + brace-free script render; dc_connect_timeout_sec parse/default).
Proxy core change is byte-transparent and off the happy path; the firewall feature is opt-in and isolated in its own unit.

Adds an OPTIONAL, default-OFF kernel-level per-source-IP inbound SYN rate-limiter on the proxy port, via iptables `-m hashlimit`. It drops abusive first-SYN bursts in the kernel BEFORE accept(), complementing the in-proxy guards (handshake flood guard / per-/24 subnet limiter) which run AFTER accept() and themselves default OFF — so every attack SYN currently still costs a socket + an accept() syscall. Idea borrowed from MTproxy-reanimation (which does this with nftables for Telemt/MTProxyMax). We use iptables hashlimit to match our existing iptables stack (TCPMSS, nfqws) — no new nftables dependency — and run it as a SEPARATE systemd oneshot unit (mtproto-syn-limit.service) so CAP_NET_ADMIN never has to be granted to mtproto-proxy itself. - `mtbuddy setup syn-limit [--preset soft|medium|hard] [--rate N/second] [--burst N] [--remove] [--status]`; interactive menu item too. - Presets mirror the source: soft 2/s burst 5 (default when enabling, CGNAT-safer), medium 1/s burst 3, hard 1/s burst 1. - Default OFF behind a loud CGNAT/VPN false-positive warning (the same shared-egress-IP problem that keeps the in-proxy guards off); warns when accept_proxy_protocol is set (kernel sees the LB IP, not real clients). - Drop counter surfaced in `mtbuddy status`; verifies the rule actually landed (xt_hashlimit present) instead of a silent no-op. - Idempotent apply (remove-before-add) and full uninstall cleanup (replay-delete INPUT jumps → flush → delete chain, both v4/v6). - The generated script avoids `{`/`}` so std.fmt can render it; rate/burst/ port are validated before being baked in. Pure renderer is unit-tested.

Adds `dc_connect_timeout_sec` (default 10): a per-endpoint deadline for completing the TCP connect to a Telegram DC endpoint. A filtered/black-holed endpoint sends no RST, so the kernel keeps the connect in SYN_SENT for ~2 min. handshake_timeout_sec (15s, measured from the client's first byte) already caps the whole client handshake, so we are NOT exposed to an unbounded hang — but that cap is GLOBAL, so a slow first endpoint starves the failover budget for the remaining endpoints. This fires per endpoint: if connect() hasn't completed within the deadline, the endpoint is abandoned and failover advances to the next one (within the overall handshake_timeout_sec ceiling). Implementation mirrors onUpstreamConnectComplete's failure path exactly (cleanupFailedUpstreamConnect → tryNextDcEndpoint for .dc kinds, else closeSlot), driven from the timer tick. `upstream_connect_started_ms` is stamped per attempt in startConnectUpstream (reset on every endpoint via the pool's slot.* = .{}), and the check is gated on phase == .connecting_upstream so it can never touch an established relay. A healthy connect completes in well under a second, so this never affects working endpoints. Inspired by the tg_connect knob in the MTproxy-reanimation analysis; deliberately does NOT raise handshake_timeout_sec (that would widen the active-probe / slow-loris window the DD-decision and flood guards exist to shrink). idle_timeout_sec=120 already equals their client_keepalive=120.

Mirror the two new features across all five READMEs (en/ru/zh/fa/vi) and THREAT_MODEL.md: - `mtbuddy setup syn-limit` command (presets, status, remove), an entry in the abuse-guards note framing it as the optional kernel-level layer that drops SYN bursts before accept() (separate unit, no CAP_NET_ADMIN on the proxy), and the config-table / example mention. - `dc_connect_timeout_sec` config example line + table row. Translations keep technical tokens (commands, flags, config keys, hashlimit, CAP_NET_ADMIN, accept(), SYN, SYN_SENT, RST, CGNAT) verbatim and match each file's existing style.

sleep3r added 3 commits June 15, 2026 16:11

sleep3r merged commit b346b75 into main Jun 15, 2026
8 checks passed

sleep3r deleted the feat/synlimit-and-dc-connect-timeout branch June 15, 2026 13:27

sleep3r mentioned this pull request Jun 15, 2026

chore(main): release 1.9.0 #364

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: optional kernel SYN rate-limiter + per-endpoint DC connect timeout#363

feat: optional kernel SYN rate-limiter + per-endpoint DC connect timeout#363
sleep3r merged 3 commits into
mainfrom
feat/synlimit-and-dc-connect-timeout

sleep3r commented Jun 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

sleep3r commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. mtbuddy setup syn-limit — optional kernel SYN rate-limiter (T1)

2. dc_connect_timeout_sec — per-endpoint DC connect timeout (T2)

Not ported (verified against our code)

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sleep3r commented Jun 15, 2026 •

edited

Loading

1. `mtbuddy setup syn-limit` — optional kernel SYN rate-limiter (T1)

2. `dc_connect_timeout_sec` — per-endpoint DC connect timeout (T2)