Surface corrupt WAL frames instead of dropping trailing bytes by LucaCappelletti94 · Pull Request #86 · isdaniel/pg-walstream

LucaCappelletti94 · 2026-06-27T08:55:28Z

parse_wal_message and parse_wal_message_bytes now return a new ReplicationError::TrailingBytes when the input buffer carries bytes beyond a single WAL message, instead of parsing one message and silently dropping the remainder. Each replication frame carries exactly one logical message, so leftover bytes signal a corrupt or misframed input that should surface rather than pass unnoticed.

This is safe for the crate's own pipeline. The stream feeds exactly one message per XLogData frame (the 25-byte header is stripped in stream.rs before the parser runs), so the check never fires during correct operation. The old tolerance also had no legitimate use from the public API, since both methods return only the parsed message and no consumed-length, leaving a caller no way to notice or act on trailing bytes.

Note this changes the behavior of existing public methods rather than adding new ones, so it is not purely additive. A downstream caller that passed an over-long buffer and relied on getting the first message back will now get an error. On 0.x that is a MINOR bump.

gemini-code-assist

Code Review

This pull request introduces a new ReplicationError::TrailingBytes error variant to detect and reject unconsumed trailing bytes in both parse_wal_message and parse_wal_message_bytes, ensuring that each replication frame contains exactly one message. The feedback suggests renaming the expected and actual fields of the new error variant to consumed and total to make the API more self-documenting and intuitive, along with updating the corresponding Display implementation, parser logic, and unit tests.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

codecov · 2026-06-27T08:57:35Z

Codecov Report

❌ Patch coverage is 97.82609% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.58%. Comparing base (919f154) to head (26664c2).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/protocol.rs	97.46%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #86      +/-   ##
==========================================
+ Coverage   95.51%   95.58%   +0.07%     
==========================================
  Files          16       16              
  Lines       16274    16364      +90     
==========================================
+ Hits        15544    15642      +98     
+ Misses        730      722       -8

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

LucaCappelletti94 · 2026-06-27T09:14:15Z

Renamed to consumed/total. Reworded the doc to drop the circular phrasing: consumed is how many bytes the parser read, total is the input length. All comments are roughly about the same renaming.

isdaniel · 2026-06-27T11:39:32Z

Hi @LucaCappelletti94 Thank you for your PR, but my main concern is that parse_wal_message / parse_wal_message_bytes are hot-path code that every message flows through ,and I'd like to keep them as lean as possible.

In normal operation each XLogData frame carries exactly one logical message (the pipeline strips the 25-byte header before parsing), so "no trailing bytes" is effectively an invariant, not a condition we expect to hit.

Could you help us to measure how many perf will downgrade if we add the code? and May I know why would you need those logical in hot path?

LucaCappelletti94 · 2026-06-27T12:38:34Z

I benchmarked cargo bench --bench wal_pipeline against main and you were right: my first version regressed the hot path up to ~50 percent on small messages. The check itself was not the cost. Unwrapping the parser Result and rewrapping it with Ok(...) forced an extra copy of the large StreamingReplicationMessage and defeated RVO. Moving the check inside parse_wal_message_from_reader, so the public methods stay tail calls, restores RVO and leaves only a reader.remaining() read and a compare. The parse benches are now at parity with main.

On why it is worth having: a downstream consumer that parses frames directly gets back only the message with no consumed-length, so today a corrupted, misframed, or injected stream is silently truncated to its first message. The checked error is what lets it notice instead of dropping it.

isdaniel · 2026-06-28T06:05:29Z

I will do some benchmark and load testing to ensure how many perf will be downgrade for from this PR first, that might take few days.

LucaCappelletti94 · 2026-06-28T06:25:44Z

I will do some benchmark and load testing to ensure how many perf will be downgrade for from this PR first, that might take few days.

Here is my measurements ofcargo bench --bench wal_pipeline (Criterion 0.8.2), paired back to back, baseline on main 919f154 then PR head 26664c2. Median values. Negative means the PR is faster.

benchmark	main (ns)	this PR (ns)	PR vs main
wal_header_parse/copy (control)	14.70	14.31	-2.6%
wal_header_parse/zero_copy (control)	3.63	3.65	+0.6%
parse_begin/copy	58.92	59.31	+0.7%
parse_begin/zero_copy	64.47	63.38	-1.7%
parse_commit/copy	76.43	77.95	+2.0%
parse_commit/zero_copy	81.91	81.49	-0.5%
parse_insert/copy	242.01	248.87	+2.8%
parse_insert/zero_copy	226.49	234.41	+3.5%
parse_insert_pipeline/copy	301.81	295.79	-2.0%
parse_insert_pipeline/zero_copy	244.61	243.50	-0.5%
parse_multi_column/5 copy	214.78	222.14	+3.4%
parse_multi_column/5 zero_copy	187.99	183.29	-2.5%
parse_multi_column/10 copy	298.24	294.27	-1.3%
parse_multi_column/10 zero_copy	248.60	241.96	-2.7%
parse_multi_column/20 copy	447.12	441.19	-1.3%
parse_multi_column/20 zero_copy	371.86	364.88	-1.9%
parse_multi_column/50 copy	848.53	854.86	+0.7%
parse_multi_column/50 zero_copy	764.03	733.13	-4.0%

The control moves 2.6 percent on code the PR never touches, so anything within roughly 3 percent is drift. Every parse row is in that band, and where one path leans positive its sibling leans negative, which is layout jitter, not a real cost.

isdaniel · 2026-06-29T04:56:41Z

As I did some benchmark and real load testing to DB, I can see perf downgress from PR86 would be higher than my expected after compared with main branch.

Microbench — cargo bench --bench wal_pipeline x5, parser only, no DB. Reliable signal.
Load test — builds pg-walstream-loadtest (rustls-tls), runs x5 vs the DB. 14 scenarios x ~40s: baseline, batch 100/5000, 4 concurrent, wide 20-col, 2KB payload, mixed DML, stress ramp 16→192 writers. Remote DB = measures pure library overhead.

Perf: main vs PR86 (feat/parse-wal-message-strict)

Medians. bench: main=5, PR86=5. loadtest: main=5, PR86=5.

1. Microbenchmarks (ns — lower better)

Benchmark	main	PR86	Δ%
`buffer_reader_create/copy_from_slice/1024`	26.91	26.78	-0.49%
`buffer_reader_create/copy_from_slice/256`	20.78	21.10	+1.52%
`buffer_reader_create/copy_from_slice/4096`	84.95	86.24	+1.52%
`buffer_reader_create/copy_from_slice/64`	20.56	20.93	+1.81%
`buffer_reader_create/zero_copy_bytes/1024`	14.67	14.66	-0.10%
`buffer_reader_create/zero_copy_bytes/256`	14.69	14.78	+0.66%
`buffer_reader_create/zero_copy_bytes/4096`	14.67	14.71	+0.25%
`buffer_reader_create/zero_copy_bytes/64`	14.67	14.66	-0.04%
`parse_begin/zero_copy_path`	334.89	342.30	+2.21%
`parse_commit/zero_copy_path`	88.23	112.71	+27.75% ⚠️
`parse_insert/zero_copy_path`	310.64	330.14	+6.28% ⚠️
`parse_insert_pipeline/copy_path`	398.97	410.18	+2.81%
`parse_insert_pipeline/zero_copy_path`	330.14	353.93	+7.21% ⚠️
`parse_multi_column_insert/copy_path/10`	397.56	415.90	+4.61% ⚠️
`parse_multi_column_insert/copy_path/20`	625.86	621.40	-0.71%
`parse_multi_column_insert/copy_path/5`	295.66	320.52	+8.41% ⚠️
`parse_multi_column_insert/copy_path/50`	1260.80	1261.60	+0.06%
`parse_multi_column_insert/zero_copy_path/10`	334.44	352.31	+5.34% ⚠️
`parse_multi_column_insert/zero_copy_path/20`	539.47	555.74	+3.02% ⚠️
`parse_multi_column_insert/zero_copy_path/5`	242.67	262.77	+8.28% ⚠️
`parse_multi_column_insert/zero_copy_path/50`	84.01	104.42	+24.30% ⚠️
`wal_header_parse/copy_path`	20.71	20.65	-0.29%
`wal_header_parse/zero_copy_path`	56.56	80.24	+41.88% ⚠️

2. Load test DML TPS (higher better)

Scenario	main	PR86	Δ%
4-Writers	138,780	127,285	-8.28%
Baseline	137,605	95,435	-30.65%
Batch-100	154,256	126,634	-17.91%
Batch-5000	147,694	123,150	-16.62%
Mixed-DML	122,830	126,091	+2.65%
Payload-2KB	120,752	113,821	-5.74%
Stress-128w	70,397	73,927	+5.01%
Stress-16w	121,341	110,506	-8.93%
Stress-192w	68,925	66,840	-3.02%
Stress-32w	97,261	103,082	+5.98%
Stress-48w	91,074	89,481	-1.75%
Stress-64w	77,811	84,187	+8.19%
Stress-96w	79,049	73,008	-7.64%
Wide-20col	135,389	115,527	-14.67%

3. Efficiency — 1% CPU = N TPS

Scenario	main	PR86	Δ%
4-Writers	3,448	2,854	-17.23%
Baseline	3,316	2,472	-25.46%
Batch-100	3,501	2,995	-14.45%
Batch-5000	3,379	2,840	-15.97%
Mixed-DML	2,832	2,673	-5.62%
Payload-2KB	2,830	2,687	-5.05%
Stress-128w	2,187	2,153	-1.55%
Stress-16w	2,923	2,420	-17.23%
Stress-192w	1,998	2,143	+7.27%
Stress-32w	2,694	2,264	-15.96%
Stress-48w	2,256	2,141	-5.11%
Stress-64w	2,104	2,170	+3.12%
Stress-96w	2,114	2,181	+3.18%
Wide-20col	3,044	2,751	-9.63%

4. Summary

TPS median Δ -6.69% (-30.65..+8.19%)
Efficiency median Δ -7.62%
1% CPU buys: main≈2,831 TPS, PR86≈2,446 TPS

isdaniel · 2026-06-29T05:32:26Z

Hi @LucaCappelletti94, I understand some of clients who want to have rigorous checking every packet frame from replication, how about use a Env/flag to determine whether want to have rigorous checking and client will not it will lose some perf, and that flag/evn default is false.

in future other checking on packet frame from replication all can rely on that env/flag to determind whether need to check.

gemini-code-assist Bot reviewed Jun 27, 2026

View reviewed changes

Comment thread src/error.rs Outdated

Comment thread src/error.rs Outdated

Comment thread src/error.rs

Comment thread src/protocol.rs Outdated

Comment thread src/protocol.rs Outdated

Comment thread src/protocol.rs Outdated

Comment thread src/protocol.rs Outdated

Add TrailingBytes error variant for unconsumed WAL input

97ffc76

Reject trailing bytes in parse_wal_message and parse_wal_message_bytes

26664c2

LucaCappelletti94 force-pushed the feat/parse-wal-message-strict branch from 158b594 to 26664c2 Compare June 27, 2026 12:40

isdaniel force-pushed the main branch from 76856ca to ccbab08 Compare June 29, 2026 14:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Surface corrupt WAL frames instead of dropping trailing bytes#86

Surface corrupt WAL frames instead of dropping trailing bytes#86
LucaCappelletti94 wants to merge 2 commits into
isdaniel:mainfrom
LucaCappelletti94:feat/parse-wal-message-strict

LucaCappelletti94 commented Jun 27, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Jun 27, 2026 •

edited

Loading

Uh oh!

LucaCappelletti94 commented Jun 27, 2026

Uh oh!

isdaniel commented Jun 27, 2026

Uh oh!

LucaCappelletti94 commented Jun 27, 2026

Uh oh!

isdaniel commented Jun 28, 2026

Uh oh!

LucaCappelletti94 commented Jun 28, 2026

Uh oh!

isdaniel commented Jun 29, 2026 •

edited

Loading

Uh oh!

isdaniel commented Jun 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

LucaCappelletti94 commented Jun 27, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Jun 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

LucaCappelletti94 commented Jun 27, 2026

Uh oh!

isdaniel commented Jun 27, 2026

Uh oh!

LucaCappelletti94 commented Jun 27, 2026

Uh oh!

isdaniel commented Jun 28, 2026

Uh oh!

LucaCappelletti94 commented Jun 28, 2026

Uh oh!

isdaniel commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Perf: main vs PR86 (feat/parse-wal-message-strict)

1. Microbenchmarks (ns — lower better)

2. Load test DML TPS (higher better)

3. Efficiency — 1% CPU = N TPS

4. Summary

Uh oh!

isdaniel commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented Jun 27, 2026 •

edited

Loading

isdaniel commented Jun 29, 2026 •

edited

Loading

isdaniel commented Jun 29, 2026 •

edited

Loading