Skip to content

Surface corrupt WAL frames instead of dropping trailing bytes#86

Open
LucaCappelletti94 wants to merge 2 commits into
isdaniel:mainfrom
LucaCappelletti94:feat/parse-wal-message-strict
Open

Surface corrupt WAL frames instead of dropping trailing bytes#86
LucaCappelletti94 wants to merge 2 commits into
isdaniel:mainfrom
LucaCappelletti94:feat/parse-wal-message-strict

Conversation

@LucaCappelletti94

Copy link
Copy Markdown
Contributor

parse_wal_message and parse_wal_message_bytes now return a new ReplicationError::TrailingBytes when the input buffer carries bytes beyond a single WAL message, instead of parsing one message and silently dropping the remainder. Each replication frame carries exactly one logical message, so leftover bytes signal a corrupt or misframed input that should surface rather than pass unnoticed.

This is safe for the crate's own pipeline. The stream feeds exactly one message per XLogData frame (the 25-byte header is stripped in stream.rs before the parser runs), so the check never fires during correct operation. The old tolerance also had no legitimate use from the public API, since both methods return only the parsed message and no consumed-length, leaving a caller no way to notice or act on trailing bytes.

Note this changes the behavior of existing public methods rather than adding new ones, so it is not purely additive. A downstream caller that passed an over-long buffer and relied on getting the first message back will now get an error. On 0.x that is a MINOR bump.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new ReplicationError::TrailingBytes error variant to detect and reject unconsumed trailing bytes in both parse_wal_message and parse_wal_message_bytes, ensuring that each replication frame contains exactly one message. The feedback suggests renaming the expected and actual fields of the new error variant to consumed and total to make the API more self-documenting and intuitive, along with updating the corresponding Display implementation, parser logic, and unit tests.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/error.rs Outdated
Comment thread src/error.rs Outdated
Comment thread src/error.rs
Comment thread src/protocol.rs Outdated
Comment thread src/protocol.rs Outdated
Comment thread src/protocol.rs Outdated
Comment thread src/protocol.rs Outdated
@codecov

codecov Bot commented Jun 27, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 97.82609% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.58%. Comparing base (919f154) to head (26664c2).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/protocol.rs 97.46% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #86      +/-   ##
==========================================
+ Coverage   95.51%   95.58%   +0.07%     
==========================================
  Files          16       16              
  Lines       16274    16364      +90     
==========================================
+ Hits        15544    15642      +98     
+ Misses        730      722       -8     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@LucaCappelletti94

Copy link
Copy Markdown
Contributor Author

Renamed to consumed/total. Reworded the doc to drop the circular phrasing: consumed is how many bytes the parser read, total is the input length. All comments are roughly about the same renaming.

@isdaniel

Copy link
Copy Markdown
Owner

Hi @LucaCappelletti94 Thank you for your PR, but my main concern is that parse_wal_message / parse_wal_message_bytes are hot-path code that every message flows through ,and I'd like to keep them as lean as possible.

In normal operation each XLogData frame carries exactly one logical message (the pipeline strips the 25-byte header before parsing), so "no trailing bytes" is effectively an invariant, not a condition we expect to hit.

Could you help us to measure how many perf will downgrade if we add the code? and May I know why would you need those logical in hot path?

@LucaCappelletti94

Copy link
Copy Markdown
Contributor Author

I benchmarked cargo bench --bench wal_pipeline against main and you were right: my first version regressed the hot path up to ~50 percent on small messages. The check itself was not the cost. Unwrapping the parser Result and rewrapping it with Ok(...) forced an extra copy of the large StreamingReplicationMessage and defeated RVO. Moving the check inside parse_wal_message_from_reader, so the public methods stay tail calls, restores RVO and leaves only a reader.remaining() read and a compare. The parse benches are now at parity with main.

On why it is worth having: a downstream consumer that parses frames directly gets back only the message with no consumed-length, so today a corrupted, misframed, or injected stream is silently truncated to its first message. The checked error is what lets it notice instead of dropping it.

@LucaCappelletti94 LucaCappelletti94 force-pushed the feat/parse-wal-message-strict branch from 158b594 to 26664c2 Compare June 27, 2026 12:40
@isdaniel

Copy link
Copy Markdown
Owner

I will do some benchmark and load testing to ensure how many perf will be downgrade for from this PR first, that might take few days.

@LucaCappelletti94

Copy link
Copy Markdown
Contributor Author

I will do some benchmark and load testing to ensure how many perf will be downgrade for from this PR first, that might take few days.

Here is my measurements ofcargo bench --bench wal_pipeline (Criterion 0.8.2), paired back to back, baseline on main 919f154 then PR head 26664c2. Median values. Negative means the PR is faster.

benchmark main (ns) this PR (ns) PR vs main
wal_header_parse/copy (control) 14.70 14.31 -2.6%
wal_header_parse/zero_copy (control) 3.63 3.65 +0.6%
parse_begin/copy 58.92 59.31 +0.7%
parse_begin/zero_copy 64.47 63.38 -1.7%
parse_commit/copy 76.43 77.95 +2.0%
parse_commit/zero_copy 81.91 81.49 -0.5%
parse_insert/copy 242.01 248.87 +2.8%
parse_insert/zero_copy 226.49 234.41 +3.5%
parse_insert_pipeline/copy 301.81 295.79 -2.0%
parse_insert_pipeline/zero_copy 244.61 243.50 -0.5%
parse_multi_column/5 copy 214.78 222.14 +3.4%
parse_multi_column/5 zero_copy 187.99 183.29 -2.5%
parse_multi_column/10 copy 298.24 294.27 -1.3%
parse_multi_column/10 zero_copy 248.60 241.96 -2.7%
parse_multi_column/20 copy 447.12 441.19 -1.3%
parse_multi_column/20 zero_copy 371.86 364.88 -1.9%
parse_multi_column/50 copy 848.53 854.86 +0.7%
parse_multi_column/50 zero_copy 764.03 733.13 -4.0%

The control moves 2.6 percent on code the PR never touches, so anything within roughly 3 percent is drift. Every parse row is in that band, and where one path leans positive its sibling leans negative, which is layout jitter, not a real cost.

@isdaniel

isdaniel commented Jun 29, 2026

Copy link
Copy Markdown
Owner

As I did some benchmark and real load testing to DB, I can see perf downgress from PR86 would be higher than my expected after compared with main branch.

  • Microbench — cargo bench --bench wal_pipeline x5, parser only, no DB. Reliable signal.
  • Load test — builds pg-walstream-loadtest (rustls-tls), runs x5 vs the DB. 14 scenarios x ~40s: baseline, batch 100/5000, 4 concurrent, wide 20-col, 2KB payload, mixed DML, stress ramp 16→192 writers. Remote DB = measures pure library overhead.

Perf: main vs PR86 (feat/parse-wal-message-strict)

Medians. bench: main=5, PR86=5. loadtest: main=5, PR86=5.

1. Microbenchmarks (ns — lower better)

Benchmark main PR86 Δ%
buffer_reader_create/copy_from_slice/1024 26.91 26.78 -0.49%
buffer_reader_create/copy_from_slice/256 20.78 21.10 +1.52%
buffer_reader_create/copy_from_slice/4096 84.95 86.24 +1.52%
buffer_reader_create/copy_from_slice/64 20.56 20.93 +1.81%
buffer_reader_create/zero_copy_bytes/1024 14.67 14.66 -0.10%
buffer_reader_create/zero_copy_bytes/256 14.69 14.78 +0.66%
buffer_reader_create/zero_copy_bytes/4096 14.67 14.71 +0.25%
buffer_reader_create/zero_copy_bytes/64 14.67 14.66 -0.04%
parse_begin/zero_copy_path 334.89 342.30 +2.21%
parse_commit/zero_copy_path 88.23 112.71 +27.75% ⚠️
parse_insert/zero_copy_path 310.64 330.14 +6.28% ⚠️
parse_insert_pipeline/copy_path 398.97 410.18 +2.81%
parse_insert_pipeline/zero_copy_path 330.14 353.93 +7.21% ⚠️
parse_multi_column_insert/copy_path/10 397.56 415.90 +4.61% ⚠️
parse_multi_column_insert/copy_path/20 625.86 621.40 -0.71%
parse_multi_column_insert/copy_path/5 295.66 320.52 +8.41% ⚠️
parse_multi_column_insert/copy_path/50 1260.80 1261.60 +0.06%
parse_multi_column_insert/zero_copy_path/10 334.44 352.31 +5.34% ⚠️
parse_multi_column_insert/zero_copy_path/20 539.47 555.74 +3.02% ⚠️
parse_multi_column_insert/zero_copy_path/5 242.67 262.77 +8.28% ⚠️
parse_multi_column_insert/zero_copy_path/50 84.01 104.42 +24.30% ⚠️
wal_header_parse/copy_path 20.71 20.65 -0.29%
wal_header_parse/zero_copy_path 56.56 80.24 +41.88% ⚠️

2. Load test DML TPS (higher better)

Scenario main PR86 Δ%
4-Writers 138,780 127,285 -8.28%
Baseline 137,605 95,435 -30.65%
Batch-100 154,256 126,634 -17.91%
Batch-5000 147,694 123,150 -16.62%
Mixed-DML 122,830 126,091 +2.65%
Payload-2KB 120,752 113,821 -5.74%
Stress-128w 70,397 73,927 +5.01%
Stress-16w 121,341 110,506 -8.93%
Stress-192w 68,925 66,840 -3.02%
Stress-32w 97,261 103,082 +5.98%
Stress-48w 91,074 89,481 -1.75%
Stress-64w 77,811 84,187 +8.19%
Stress-96w 79,049 73,008 -7.64%
Wide-20col 135,389 115,527 -14.67%

3. Efficiency — 1% CPU = N TPS

Scenario main PR86 Δ%
4-Writers 3,448 2,854 -17.23%
Baseline 3,316 2,472 -25.46%
Batch-100 3,501 2,995 -14.45%
Batch-5000 3,379 2,840 -15.97%
Mixed-DML 2,832 2,673 -5.62%
Payload-2KB 2,830 2,687 -5.05%
Stress-128w 2,187 2,153 -1.55%
Stress-16w 2,923 2,420 -17.23%
Stress-192w 1,998 2,143 +7.27%
Stress-32w 2,694 2,264 -15.96%
Stress-48w 2,256 2,141 -5.11%
Stress-64w 2,104 2,170 +3.12%
Stress-96w 2,114 2,181 +3.18%
Wide-20col 3,044 2,751 -9.63%

4. Summary

  • TPS median Δ -6.69% (-30.65..+8.19%)
  • Efficiency median Δ -7.62%
  • 1% CPU buys: main≈2,831 TPS, PR86≈2,446 TPS

@isdaniel

isdaniel commented Jun 29, 2026

Copy link
Copy Markdown
Owner

Hi @LucaCappelletti94, I understand some of clients who want to have rigorous checking every packet frame from replication, how about use a Env/flag to determine whether want to have rigorous checking and client will not it will lose some perf, and that flag/evn default is false.

in future other checking on packet frame from replication all can rely on that env/flag to determind whether need to check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants