Skip to content

fix(eth): cap outgoing block bodies under the devp2p message size limit#12059

Merged
AnkushinDaniil merged 5 commits into
masterfrom
fix/snap-serving-outbound-framing-race
Jun 24, 2026
Merged

fix(eth): cap outgoing block bodies under the devp2p message size limit#12059
AnkushinDaniil merged 5 commits into
masterfrom
fix/snap-serving-outbound-framing-race

Conversation

@AnkushinDaniil

@AnkushinDaniil AnkushinDaniil commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Background

When serving GetBlockBodies, Nethermind could emit a BlockBodies message larger than the devp2p 16 MB message limit (SnappyParameters.MaxSnappyLength). An RLPx frame encodes its size in a 24-bit field (max 16 MB); once snappy is negotiated framing is disabled, so an oversized message becomes a single oversized frame whose size wraps. ZeroFrameEncoder then desyncs on the oversized buffer and emits a corrupt frame, and the peer is dropped:

DotNetty.Codecs.EncoderException
 ---> RlpException: Expected a sequence prefix to be in the range of <192, 255> and got <random>
   at Nethermind.Network.Rlpx.FrameHeaderReader.ReadFrameHeader
   at Nethermind.Network.Rlpx.ZeroFrameEncoder.Encode

Root cause: MessageSizeEstimator.EstimateSize(Transaction) returned 100 + tx.Data.Length — it counted only calldata and ignored access lists, authorization lists and other fields. Transactions with large non-calldata fields were under-counted, so the 2 MB soft batching limit in FulfillBlockBodiesRequest admitted far more than estimated. Observed serving a heavily-bloated state, a single BlockBodies response reached 26–47 MB, overflowing the frame and tearing down the connection on every reconnect.

Changes

  • MessageSizeEstimator.EstimateSize(Transaction) now returns the actual encoded length (TxDecoder.GetLength) instead of 100 + tx.Data.Length, so access lists, authorization lists and every other field are counted.
  • FulfillBlockBodiesRequest enforces a hard ~15 MB cap (below the 16 MB protocol limit, mirroring the existing receipts hard cap). When the next body would cross the cap, serving stops and returns the bodies gathered so far; the requesting peer re-requests the remainder.

Why return a prefix rather than skip the oversized body and continue

A requester matches returned bodies to the requested hashes positionally (BlockDownloader.HandleResponse), so omitting a non-trailing body would misalign every following one and the response would be rejected as invalid. Returning the prefix collected so far keeps the response well-aligned and lets the peer make progress and re-request from where serving stopped. A single body that alone exceeds the cap cannot be served within a devp2p frame at all, so an empty prefix is the only correct response in that case.

On EstimateSize precision vs. cost

EstimateSize(Transaction) now measures rather than estimates. This is intentional and confined to the body-serving path: its only caller is EstimateSize(Block) inside FulfillBlockBodiesRequest, which has already loaded each block from the DB (SyncServer.Find) — the GetLength walk is negligible next to that I/O. Transaction broadcast and other latency-sensitive paths use a different type (Nethermind.Synchronization.FastBlocks.MemorySizeEstimator) and are untouched, so no fast path is slowed. The cheap approximate estimate was the original behaviour and is exactly what under-counted bloated bodies and caused this bug; a correct cap requires the true size.

Types of changes

  • Bugfix (a non-breaking change that fixes an issue)

Testing

Requires testing: Yes — wrote tests: Yes.

  • MessageSizeEstimatorTests updated to assert the estimate equals the encoded length, plus a new Estimate_tx_counts_access_list regression test (a tx with a large access list is no longer under-counted).
  • All Nethermind.Network.Test estimator tests pass locally.
  • Verified end-to-end against a ~5x-mainnet bloated source serving a syncing peer: before, the peer was dropped roughly once every ~35 s with frame EncoderExceptions and made almost no progress; after, EncoderExceptions and drops drop to zero and body/state serving runs continuously (the link went from idle/stalled to fully utilised).

Remarks

This branch keeps its original name (fix/snap-serving-outbound-framing-race) from an earlier, incorrect hypothesis (an RLPx framing race); that change has been reverted and the actual fix is the block-bodies size cap above.

…e corruption

ZeroPacketSplitter.Encode read MaxFrameSize several times per message while
DisableFraming() flips it from another thread (Session.EnableSnappy runs off the
channel event loop). A concurrent flip yields a torn framing configuration and a
malformed frame header, which ZeroFrameEncoder rejects with an EncoderException,
tearing down the peer. Snapshot the value once per Encode and back it with a
volatile field.
@claude

claude Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Claude finished @AnkushinDaniil's task in 2m 56s —— View job


Code Review

  • Gather context and understand the request
  • Read changed files and relevant rules
  • Analyze the fix for correctness
  • Check test coverage
  • Post review findings

Summary

The fix is correct and minimal. The root cause — a TOCTOU race on MaxFrameSize within a single Encode call — is eliminated cleanly by snapshotting the volatile field into a local variable at entry. The volatile modifier ensures the snapshot sees the most recent write from DisableFraming(), and the local guarantees internal consistency for the rest of the call. No Critical, High, or Medium findings.


Correctness Analysis

The race (old code):
MaxFrameSize was an auto-property with a private setter, backed by a plain field — no visibility guarantees across threads. Three reads per multi-frame message:

  1. framesCount = (... - 1) / MaxFrameSize + 1 — sees DefaultMaxFrameSize = 1024
  2. totalPayloadOffset = MaxFrameSize * i — sees int.MaxValue after the flip
  3. framePayloadSize = Math.Min(MaxFrameSize, ...) — overflow or large negative result

This produces a frame with a fabricated size in the header, which ZeroFrameEncoder rejects. The stack trace in the PR description confirms exactly this path.

The fix (new code):

private volatile int _maxFrameSize = Frame.DefaultMaxFrameSize;
// ...
int maxFrameSize = _maxFrameSize;  // single volatile read; local thereafter

volatile on a 4-byte int guarantees an atomic acquire-fence read — the snapshot will see either the old value or int.MaxValue, never a torn intermediate. The loop uses only the local, so the frame layout is always self-consistent. The fix is the idiomatic correct pattern for this class of race.

Overflow safety when maxFrameSize = int.MaxValue:
framesCount = (totalPayloadSize - 1) / int.MaxValue + 1 = 1 for any message that fits in memory. The loop runs once with i = 0, so totalPayloadOffset = 0 — no multiplication overflow. ✓


Test Analysis

Concurrent_DisableFraming_does_not_corrupt_frames — correctly exercises the precise race:

  • Encodes a 4096-frame (≈4 MB) payload; the long encode gives the background thread time to fire mid-loop.
  • ManualResetEventSlim releases the flipper and the encoder at the same moment, maximising interleave probability.
  • 16 outer iterations increase coverage.
  • AssertWellFormedFrames verifies both frame boundary alignment and that payload sizes sum to the total — catching the exact corruption (truncated/oversize frames) described in the bug.

Padding equivalence in AssertWellFormedFrames:
The helper calls Frame.CalculatePadding(framePayloadSize) for every frame. Production only pads the last frame using CalculatePadding(totalPayloadSize). These are equivalent because DefaultMaxFrameSize = 16 × 64, so DefaultMaxFrameSize % BlockSize == 0, which means:

  • Non-last frames: CalculatePadding(1024) = 0 — matches production's explicit 0. ✓
  • Last frame: totalPayloadSize % BlockSize == lastFramePayloadSize % BlockSize (multiples of 1024 cancel), so the two CalculatePadding calls yield the same result. ✓

Low-severity observations (non-blocking)

1. Test race window relies on timing rather than a mid-encode barrier — The test doesn't force DisableFraming() to land at a specific point inside the loop; it relies on the encode taking long enough for the background thread to fire. This is adequate for a regression test (and is how most concurrent-race tests work in practice), but worth noting: on a heavily loaded machine the thread might not schedule until after Encode returns. Increasing iterations or adding a Thread.Yield() at the top of the loop could sharpen detection, but this is not required.

2. Pre-existing: _contextId non-volatile reads after Interlocked.Increment — Lines 80 and 88 read _contextId directly without memory barriers, while Interlocked.Increment at line 27 provides a full fence. If two calls to Encode run concurrently, both may observe the same _contextId. This is pre-existing behaviour, not introduced by this PR, and the contextId is used only for informational sequencing in multi-frame headers — not for correctness of frame boundaries.


Verdict

The fix is correct, well-scoped, and adequately tested. Ready to merge after WIP label is removed.

Comment thread src/Nethermind/Nethermind.Network.Test/Rlpx/ZeroNettyPacketSplitterTests.cs Outdated
MessageSizeEstimator.EstimateSize(tx) counted only "100 + calldata length",
ignoring access lists, authorization lists and other fields. Block bodies were
therefore under-counted and the 2 MB soft batching limit let through responses
far larger than the 16 MB devp2p / 24-bit RLPx frame limit. Such a frame's size
field wraps, the frame encoder desyncs on the oversized buffer and emits a
corrupt frame, dropping the peer (seen serving large bloated-state block bodies).

- Estimate transactions from their actual encoded length so all fields count.
- Add a hard ~15 MB cap to FulfillBlockBodiesRequest; a single body that alone
  exceeds the limit is omitted (responses are already sparse) since it cannot be
  served within the protocol message size.
@AnkushinDaniil AnkushinDaniil changed the title fix(rlpx): snapshot MaxFrameSize per Encode to prevent torn-read frame corruption fix(eth): cap outgoing block bodies under the devp2p message size limit Jun 18, 2026
FulfillBlockBodiesRequest skipped a leading body that alone exceeded the
hard size cap and continued serving later bodies. A requester matches
bodies to the requested hashes positionally (BlockDownloader.HandleResponse:
bodies[i] vs BodiesRequests[i], guarded by ValidateBodyAgainstHeader), so a
non-trailing omission misaligns every following body and is rejected as an
invalid block, triggering a breach-of-protocol disconnect of the serving
peer — the opposite of what the cap intends.

Break at the cap instead, returning the prefix accumulated so far (matching
the receipts path). A single body above the cap cannot fit a 16 MB devp2p
frame on any client, so an empty prefix is the only correct response.
Estimate_tx_with_data_size only asserted EstimateSize == TxDecoder.GetLength,
which restates the implementation and would pass even if the estimator were
wrong. Add an implementation-independent oracle: 7 extra calldata bytes must
increase the estimate by exactly 7, guarding against a regression to a
constant or a heuristic that drops a field.
@AnkushinDaniil AnkushinDaniil marked this pull request as ready for review June 19, 2026 09:19
Comment on lines +29 to +31
// Use the actual encoded length so large non-calldata fields (access lists,
// authorization lists, etc.) are accounted for and not under-counted.
return (ulong)TxDecoder.Instance.GetLength(tx, RlpBehaviors.None);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential issue here is that estimations should have been very fast and now it is potentially slow, as we are measuring not estimating

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only called from FulfillBlockBodiesRequest, where we've already loaded the block from disk - so the GetLength walk is noise next to that. It's also the same call the mempool already uses to size txs for broadcast and caches in LightTransaction, and it just sums field lengths rather than serializing, so it's not really a new cost

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe then remove that estimate method completely, and just do that length measurement in FulfillBlockBodiesRequest. Estimate name now is misleading.

Move the body-cap positional-matching rationale and the
estimate-vs-measure note out of code comments and into the PR
description; keep one-line comments at each site.
@AnkushinDaniil AnkushinDaniil merged commit 0c84699 into master Jun 24, 2026
562 checks passed
@AnkushinDaniil AnkushinDaniil deleted the fix/snap-serving-outbound-framing-race branch June 24, 2026 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants