session: fix EnqueueInitialData reversing byte order on multi-call writers by Kianmhz · Pull Request #147 · Kianmhz/GooseRelayVPN

Kianmhz · 2026-05-21T05:00:07Z

Summary

`EnqueueInitialData` was implemented to prepend to the tx buffer while `synNeeded` was true. The assumption was "called once, before the SYN drains." But the SOCKS5 adapter calls it on every `Write`, and a fast local writer can fit many calls before a poll worker drains the SYN. Each prepend pushed new bytes in front of the existing buffer, so the on-wire byte order ended up reversed.

For any protocol whose leading bytes carry framing — a TLS record header, an HTTP request line, a length prefix — the upstream parses garbage or rejects the stream. The bug has existed since the connect_data optimization landed in v1.4.

Specific consequence: every upload-throughput bench result has been wrong

The bench's `sized` sink (`bench/sink/main.go`) reads the first 8 bytes as a size header. With prepend, those 8 bytes came from the last chunk written before the SYN drained — usually a body chunk full of zeros, decoded as `size=0`. The sink ACKed instantly without reading the body, and the harness measured the time-to-instant-ACK as upload throughput.

That made v1.5 look like 22 MB/s when actual upload is ~5 MB/s. The whole "v1.5→v1.6 upload regression" narrative behind #143 was a bench artifact. With the prepend fixed and `workersPerEndpoint` tested at 3, 4, and 5:

`workersPerEndpoint`	8 MB single-session upload	8 MB 4-session upload	sessions_per_sec
3 (current main)	4.49 MB/s	~17 MB/s	4.5/s
4 (pre-#143)	4.49 MB/s	~17 MB/s	50/s
5	4.49 MB/s	17.8 MB/s	90/s

The per-session ceiling is maxDrainFramesPerSession × MaxFramePayload / ActiveDrainWindow ≈ 5.85 MB/s. Worker count doesn't change it. 4 concurrent sessions get 4 × the per-session number, as you'd expect.

What does NOT change

The carrier protocol is unchanged. This is purely a local-buffer correctness fix.
Existing single-write callers (the common case for new connections) are unaffected: with one Write before drain, append vs prepend gives the same result.
The connect_data optimization still works — data still bundles into the SYN frame.

What DOES change for real users

Any application that writes a length-prefixed or framed payload immediately after connecting was hitting silent corruption on the first message. TLS handshakes were OK because ClientHello is usually one Write that fits in the SYN frame. HTTP/1.1 was OK because the first Write usually fits the request line + headers. But anything that issued the request line and the body as separate Writes within microseconds of `Dial()` could see the request body arrive before the request line on the upstream.
This may have caused obscure first-request failures users couldn't reproduce.

What's next (separate work)

This fix means #143's premise (v1.5→v1.6 upload regression caused by workersPerEndpoint=4) was wrong — the upload numbers were always bench artifacts. A follow-up should:

Revert #143 (back to workersPerEndpoint=4) — no upload regression to worry about, get back the v1.6 session-open speedup.
Reconsider the spike in #146 on its real merits (unlabeled-multi-endpoint configs went from v1.5's 15 workers to v1.6's 4 — that part of the regression is real).
To genuinely improve per-session upload throughput beyond ~6 MB/s, the per-session in-flight lock and the 350 ms ActiveDrainWindow need design work — out of scope here.

Verification

go test -count=1 -timeout 90s ./... — all green; new test catches the bug.
./bench/bench.sh --baseline v1.6.0 --scenario throughput_up_8MB_1session — 3 consecutive runs all report 4.49 MB/s (vs the previous lying 22.3 MB/s).

🤖 Generated with Claude Code

…iters EnqueueInitialData was implemented to prepend rather than append, on the assumption it would be called once before the SYN drains. But the SOCKS5 adapter calls it on every Write, and a fast local writer (the bench harness, or any application that issues several small Writes immediately after connecting) can fit many calls before a poll worker drains the SYN. Each prepend pushed the new data in front of what was already buffered, so the on-wire byte order ended up reversed. For any protocol whose leading bytes carry framing — a TLS record header, an HTTP request line, a length prefix — the upstream parses garbage or rejects the stream. The bug has existed since the connect_data optimization landed in v1.4. A specific consequence worth flagging: every upload-throughput bench result ever recorded for this project has been wrong. The bench's sized-sink reads the first 8 bytes as a size header. With prepend, those 8 bytes came from the LAST chunk written before the SYN drained — usually a body chunk full of zeros, decoded as size=0. The sink ACKed instantly without reading the body, and the harness measured the time-to-instant-ACK as upload throughput. That made v1.5 look like 22 MB/s when actual upload was ~5 MB/s. The "v1.5→v1.6 upload regression" narrative behind #143 was a bench artifact; the real per-session ceiling is `maxDrainFramesPerSession × MaxFramePayload / ActiveDrainWindow` ≈ 5.85 MB/s in both versions. The fix is one-line (append instead of prepend); the comment is the expensive part. Test added to catch the regression directly.

Kianmhz merged commit 6340afb into main May 21, 2026
6 checks passed

Kianmhz deleted the fix/enqueue-initial-data-ordering branch May 21, 2026 05:02

This was referenced May 21, 2026

[Bug] مقایسه سرعت و کیفیت اتصال بین ۱.۶ و ۱.۵ #113

Closed

[Bug] Extreme instability of speed overall #117

Closed

[Tech] Stable But Very Slow #123

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

session: fix EnqueueInitialData reversing byte order on multi-call writers#147

session: fix EnqueueInitialData reversing byte order on multi-call writers#147
Kianmhz merged 1 commit into
mainfrom
fix/enqueue-initial-data-ordering

Kianmhz commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Kianmhz commented May 21, 2026

Summary

Specific consequence: every upload-throughput bench result has been wrong

What does NOT change

What DOES change for real users

What's next (separate work)

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant