Skip to content

fix: skip impossible payload envelope sync targets#9281

Open
lodekeeper wants to merge 3 commits intoChainSafe:unstablefrom
lodekeeper:fix/genesis-envelope-sync
Open

fix: skip impossible payload envelope sync targets#9281
lodekeeper wants to merge 3 commits intoChainSafe:unstablefrom
lodekeeper:fix/genesis-envelope-sync

Conversation

@lodekeeper
Copy link
Copy Markdown
Contributor

Summary

  • stop unknown-envelope sync from being enqueued for known roots that can never have an execution payload envelope
  • prevent Lodestar from repeatedly trying to fetch a payload envelope for the genesis block root in glamsterdam mixed-client devnets
  • add a focused regression test for the envelope-eligibility guard

Root cause

NetworkProcessor.searchUnknownEnvelope() was willing to emit unknownEnvelopeBlockRoot for any root where !forkChoice.hasPayloadHexUnsafe(root).

That is too broad for known roots like:

  • the genesis block root
  • known pre-gloas block roots

Those roots can legitimately never have an execution payload envelope, so BlockInputSync would enqueue an impossible payload download and keep retrying it forever.

Fix

Before emitting unknownEnvelopeBlockRoot, check whether the referenced block is a known envelope-eligible block:

  • unknown root => still allow sync
  • known post-gloas non-genesis root => still allow sync
  • known genesis / known pre-gloas root => skip sync

Reproduction

Local mixed-client repro from Nico:

  • EL: Nethermind (ethpandaops/nethermind:bal-devnet-4)
  • CL: 2x Prysm + 2x Lodestar
  • gloas_fork_epoch: 0, preset: minimal

Before the fix, cl-3-lodestar-nethermind immediately enqueued the genesis root (0x68c61835...) into BlockInputSync.pendingPayloads at slot 1 and looped forever on execution_payload_envelopes_by_root, logging:

  • Missing execution payload envelope for root=0x68c61835...
  • cannot find peer with needed columns=[]

Verification

  • added a regression test for the known-block envelope-eligibility helper
  • rebuilt a patched Lodestar image locally and reran the same mixed-client Kurtosis config
  • after the fix:
    • zero genesis-root pendingPayloads entries on both Lodestar nodes
    • zero genesis-root downloadPayload() / Missing execution payload envelope loops
    • lodestar_sync_unknown_block_pending_payloads_size 0 on both Lodestar nodes
    • mixed network remained healthy and finalizing with all nodes on the same head

Spec Compliance

  • N/A (implementation-side gossip preprocessor guard; no consensus spec pseudocode changed)

AI assistance used for investigation, patching, and validation.

@lodekeeper lodekeeper requested a review from a team as a code owner April 26, 2026 00:00
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to filter out execution payload envelope requests for ineligible blocks, specifically those from pre-Gloas forks or the genesis slot. A performance concern was raised regarding the placement of this eligibility check in the searchUnknownEnvelope method. Performing the fork choice lookup before checking local caches (like seenPayloadEnvelope or awaitingMessagesByPayloadBlockRoot) may lead to redundant and expensive operations for roots that are already in-flight or have been previously processed.

Comment on lines +293 to +298
const knownBlock = this.chain.forkChoice.getBlockHexDefaultStatus(root);
if (
!canKnownBlockRequireExecutionPayloadEnvelope((blockSlot) => this.chain.config.getForkSeq(blockSlot), knownBlock)
) {
return;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The eligibility check is performed before the local cache checks (seenPayloadEnvelope, awaitingMessagesByPayloadBlockRoot, and unknownEnvelopesBySlot). This introduces a performance regression because this.chain.forkChoice.getBlockHexDefaultStatus(root) (a fork choice lookup) will be executed for every incoming gossip message referencing a missing envelope, even if that root has already been identified as 'in-flight' or 'ineligible' in the current slot.

Furthermore, if a root is determined to be ineligible (e.g., the genesis root), it is not added to unknownEnvelopesBySlot. This means every subsequent message in the same slot referencing that root will trigger another redundant fork choice lookup.

Consider moving this check after the initial cache lookups and ensuring that ineligible roots are also marked in unknownEnvelopesBySlot to prevent repeated checks within the same slot.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @gemini-code-assist.

On the ordering (perf regression): valid point, will swap. Moving the cache lookups (seenPayloadEnvelope / awaitingMessagesByPayloadBlockRoot / unknownEnvelopesBySlot.has) ahead of the eligibility check costs nothing and avoids the per-message forkChoice.getBlockHexDefaultStatus call for any root we've already enqueued in the current slot. I'll push that as a follow-up.

On marking ineligible roots in unknownEnvelopesBySlot: I'd push back on this one. That map's invariant is "roots we are waiting for an envelope for, and which a sender is expected to deliver". Ineligible roots (genesis, pre-Gloas blocks) explicitly will never have an incoming envelope — adding them there pollutes the semantics, makes the prune loop's behaviour ambiguous, and would also mean we'd emit ChainEvent.unknownEnvelopeBlockRoot for them earlier in the path before the eligibility filter catches it.

The redundant fork-choice lookup concern is real in theory but small in practice:

  • getBlockHexDefaultStatus is an O(1) protoArray Map lookup.
  • The only ineligible cases are slot === GENESIS_SLOT or pre-Gloas — neither shows up meaningfully in steady-state gossip on a Gloas chain. Gossip messages don't typically reference the genesis root as a parent (block 1 was proposed long ago), and pre-Gloas roots only appear during the fork transition window.

If we wanted to short-circuit even those edge cases without polluting unknownEnvelopesBySlot, the cleaner fix would be a separate small "known-ineligible roots" set rather than overloading the existing map — but the marginal benefit doesn't seem worth the extra cache + prune machinery yet.

The order swap covers the much more common path (repeated messages for an in-flight unknown root) and lands the win you flagged.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed breakdown, lodekeeper. Your reasoning regarding the unknownEnvelopesBySlot semantics is sound—maintaining the map's invariant is important for the prune loop and avoiding premature event emission.

Swapping the order to prioritize cache lookups is a solid improvement that addresses the performance regression for the common case of repeated gossip messages. I agree that the marginal benefit of a separate 'known-ineligible' set doesn't justify the added complexity at this stage. Looking forward to the follow-up PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — I pushed 2a31f57 to swap the cache short-circuit ahead of the fork-choice lookup.

So seenPayloadEnvelope(root), awaitingMessagesByPayloadBlockRoot.has(root), and unknownEnvelopesBySlot.getOrDefault(slot).has(root) now all return early before we touch getBlockHexDefaultStatus(root).

I kept the separate pushback on marking ineligible roots in unknownEnvelopesBySlot itself, but the common-case perf regression you flagged should be gone with this follow-up.

@nflaig nflaig added the spec-gloas Issues targeting the Glamsterdam spec version label Apr 26, 2026
return true;
}

return knownBlock.slot > GENESIS_SLOT && forkSeqAtSlot(knownBlock.slot) >= ForkSeq.gloas;
Copy link
Copy Markdown
Member

@nflaig nflaig Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the genesis block case should no longer be relevant after changes to the genesis state, the genesis block will be treated as EMPTY consistently (@lodekeeper do not change the PR yet, this is just a note)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted — thanks. I have not changed this PR based on the genesis-semantics note.

I only pushed the separate cache-ordering perf follow-up from the Gemini thread. I’ll keep your comment here as context for whether this guard can be narrowed or simplified once the genesis/EMPTY behavior settles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

spec-gloas Issues targeting the Glamsterdam spec version

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants