fix: skip impossible payload envelope sync targets#9281
fix: skip impossible payload envelope sync targets#9281lodekeeper wants to merge 3 commits intoChainSafe:unstablefrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to filter out execution payload envelope requests for ineligible blocks, specifically those from pre-Gloas forks or the genesis slot. A performance concern was raised regarding the placement of this eligibility check in the searchUnknownEnvelope method. Performing the fork choice lookup before checking local caches (like seenPayloadEnvelope or awaitingMessagesByPayloadBlockRoot) may lead to redundant and expensive operations for roots that are already in-flight or have been previously processed.
| const knownBlock = this.chain.forkChoice.getBlockHexDefaultStatus(root); | ||
| if ( | ||
| !canKnownBlockRequireExecutionPayloadEnvelope((blockSlot) => this.chain.config.getForkSeq(blockSlot), knownBlock) | ||
| ) { | ||
| return; | ||
| } |
There was a problem hiding this comment.
The eligibility check is performed before the local cache checks (seenPayloadEnvelope, awaitingMessagesByPayloadBlockRoot, and unknownEnvelopesBySlot). This introduces a performance regression because this.chain.forkChoice.getBlockHexDefaultStatus(root) (a fork choice lookup) will be executed for every incoming gossip message referencing a missing envelope, even if that root has already been identified as 'in-flight' or 'ineligible' in the current slot.
Furthermore, if a root is determined to be ineligible (e.g., the genesis root), it is not added to unknownEnvelopesBySlot. This means every subsequent message in the same slot referencing that root will trigger another redundant fork choice lookup.
Consider moving this check after the initial cache lookups and ensuring that ineligible roots are also marked in unknownEnvelopesBySlot to prevent repeated checks within the same slot.
There was a problem hiding this comment.
Thanks for the review @gemini-code-assist.
On the ordering (perf regression): valid point, will swap. Moving the cache lookups (seenPayloadEnvelope / awaitingMessagesByPayloadBlockRoot / unknownEnvelopesBySlot.has) ahead of the eligibility check costs nothing and avoids the per-message forkChoice.getBlockHexDefaultStatus call for any root we've already enqueued in the current slot. I'll push that as a follow-up.
On marking ineligible roots in unknownEnvelopesBySlot: I'd push back on this one. That map's invariant is "roots we are waiting for an envelope for, and which a sender is expected to deliver". Ineligible roots (genesis, pre-Gloas blocks) explicitly will never have an incoming envelope — adding them there pollutes the semantics, makes the prune loop's behaviour ambiguous, and would also mean we'd emit ChainEvent.unknownEnvelopeBlockRoot for them earlier in the path before the eligibility filter catches it.
The redundant fork-choice lookup concern is real in theory but small in practice:
getBlockHexDefaultStatusis an O(1) protoArray Map lookup.- The only ineligible cases are
slot === GENESIS_SLOTor pre-Gloas — neither shows up meaningfully in steady-state gossip on a Gloas chain. Gossip messages don't typically reference the genesis root as a parent (block 1 was proposed long ago), and pre-Gloas roots only appear during the fork transition window.
If we wanted to short-circuit even those edge cases without polluting unknownEnvelopesBySlot, the cleaner fix would be a separate small "known-ineligible roots" set rather than overloading the existing map — but the marginal benefit doesn't seem worth the extra cache + prune machinery yet.
The order swap covers the much more common path (repeated messages for an in-flight unknown root) and lands the win you flagged.
There was a problem hiding this comment.
Thanks for the detailed breakdown, lodekeeper. Your reasoning regarding the unknownEnvelopesBySlot semantics is sound—maintaining the map's invariant is important for the prune loop and avoiding premature event emission.
Swapping the order to prioritize cache lookups is a solid improvement that addresses the performance regression for the common case of repeated gossip messages. I agree that the marginal benefit of a separate 'known-ineligible' set doesn't justify the added complexity at this stage. Looking forward to the follow-up PR.
There was a problem hiding this comment.
Good catch — I pushed 2a31f57 to swap the cache short-circuit ahead of the fork-choice lookup.
So seenPayloadEnvelope(root), awaitingMessagesByPayloadBlockRoot.has(root), and unknownEnvelopesBySlot.getOrDefault(slot).has(root) now all return early before we touch getBlockHexDefaultStatus(root).
I kept the separate pushback on marking ineligible roots in unknownEnvelopesBySlot itself, but the common-case perf regression you flagged should be gone with this follow-up.
| return true; | ||
| } | ||
|
|
||
| return knownBlock.slot > GENESIS_SLOT && forkSeqAtSlot(knownBlock.slot) >= ForkSeq.gloas; |
There was a problem hiding this comment.
the genesis block case should no longer be relevant after changes to the genesis state, the genesis block will be treated as EMPTY consistently (@lodekeeper do not change the PR yet, this is just a note)
There was a problem hiding this comment.
Noted — thanks. I have not changed this PR based on the genesis-semantics note.
I only pushed the separate cache-ordering perf follow-up from the Gemini thread. I’ll keep your comment here as context for whether this guard can be narrowed or simplified once the genesis/EMPTY behavior settles.
Summary
Root cause
NetworkProcessor.searchUnknownEnvelope()was willing to emitunknownEnvelopeBlockRootfor any root where!forkChoice.hasPayloadHexUnsafe(root).That is too broad for known roots like:
Those roots can legitimately never have an execution payload envelope, so
BlockInputSyncwould enqueue an impossible payload download and keep retrying it forever.Fix
Before emitting
unknownEnvelopeBlockRoot, check whether the referenced block is a known envelope-eligible block:Reproduction
Local mixed-client repro from Nico:
ethpandaops/nethermind:bal-devnet-4)gloas_fork_epoch: 0,preset: minimalBefore the fix,
cl-3-lodestar-nethermindimmediately enqueued the genesis root (0x68c61835...) intoBlockInputSync.pendingPayloadsat slot 1 and looped forever onexecution_payload_envelopes_by_root, logging:Missing execution payload envelope for root=0x68c61835...cannot find peer with needed columns=[]Verification
pendingPayloadsentries on both Lodestar nodesdownloadPayload()/Missing execution payload envelopeloopslodestar_sync_unknown_block_pending_payloads_size 0on both Lodestar nodesSpec Compliance
AI assistance used for investigation, patching, and validation.