Skip to content

Gloas lookup sync#9155

Open
dapplion wants to merge 1 commit intosigp:unstablefrom
dapplion:gloas-lookup-sync-fixes
Open

Gloas lookup sync#9155
dapplion wants to merge 1 commit intosigp:unstablefrom
dapplion:gloas-lookup-sync-fixes

Conversation

@dapplion
Copy link
Copy Markdown
Collaborator

@dapplion dapplion commented Apr 21, 2026

What this PR does

Extends the single-block lookup to work with Gloas, where a block, its blob/column data, and its execution payload envelope are separate things that can arrive independently and out of order. The existing trait-based request system was starting to creak under that, so it's replaced with three small state machines — one per component — driven by a single continue_requests loop. No component ever mutates another; streams are only added to.

A parent's "full" vs "empty" classification needs a child's bid to tell them apart, so AwaitingParent now carries the child's parent_block_hash alongside the parent root. When a child classifies its parent as full, it contributes its peers to the parent's payload/data streams.

The PayloadEnvelopesByRoot RPC is wired through SyncNetworkContext — request dispatch, response routing, and hooking into the new PayloadRequest state machine. Envelope processing (beacon-processor wiring, import path) is intentionally left as a TODO and should land in a follow-up PR. Envelopes imported from gossip are untouched by this PR.

Scope notes

  • Pre-Gloas: the data stream now falls back to chain.get_block_process_status() if the block hasn't yet been downloaded by the lookup but is in the availability checker — matching the existing sigp/unstable behaviour where block and blob/column requests proceed in parallel.
  • Test rig: Gloas envelopes are served out of a network_envelopes_by_root cache populated from the external harness, and the test harness uses 8 validators (Gloas's proposer lookahead doesn't populate with 1).
  • Miscellaneous fixes from devnet runs: genesis-parent check, infinite retry loop on repeated download failure, on_completed_request no-op, peer sets not cleared on disconnect.

TEST_NETWORK_FORKS unchanged — enabling Gloas in that matrix is blocked on a separate clean-up pass (several existing Gloas tests still panic against this rewrite).

Review notes

Pre-upstream review on dapplion#68 (@pawanjay176 — thanks!). The first-round comments are addressed in this push; the remaining ones from 2026-04-22 I'll work through inline. Payload-processing wiring is explicitly deferred per that review thread.

Test plan

  • make lint clean
  • FORK_NAME=phase0 cargo nextest run --features fork_from_env,fake_crypto -p network sync::tests:: — 79/79
  • FORK_NAME=electra cargo nextest run --features fork_from_env,fake_crypto -p network sync::tests::lookups:: — 59/59
  • FORK_NAME=fulu cargo nextest run --features fork_from_env,fake_crypto -p network sync::tests::lookups:: — 59/59
  • Local Kurtosis ePBS devnet run
  • Verify envelope downloads during single block lookups at head

@dapplion dapplion requested a review from jxs as a code owner April 21, 2026 22:39
.contains_block(root)
}

// TODO(gloas): implement this once issue #8956 is resolved
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is resolved now. We can use canonical head snapshot to check this

"Processing payload attestation message"
);

// Trigger lookup sync by beacon block root. Treat payload attestations as unknown block
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would this also act as a trigger for the payload itself depending on the value of payload_present?
can be done in a future PR too

let parent_in_fork_choice = cx
.chain
.canonical_head
.fork_choice_read_lock()
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think checking here in fork choice might lead to weird situations when the block is partially verified and in the da checker, but not yet in fork choice.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little uncomfortable with a direct check on fork choice here. Might lead to weird edge cases because we check for a block's existence in different places. For e.g. we check the chain directly here

https://github.com/dapplion/lighthouse/blob/c23b1af567fcae78d7a0bbe4d32fb823c7f52b42/beacon_node/network/src/sync/network_context.rs#L862

future refactors would need to be aware of this. Is handling ParentUnknown failures and making them trigger parent requests like before safer?

state.make_request(|| cx.block_lookup_request(id, peers, block_root))?;

if state.is_completed() {
// Block already imported (e.g. execution validated). Nothing further
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is accurate. A block being execution validated doesn't mean we have the data blobs/column/payload. We may need to pull the block from get_block_process_status and then just continue the request for the other components.

state: SingleLookupRequestState<Arc<SignedExecutionPayloadEnvelope<E>>>,
},
Downloaded {
peer_group: PeerGroup,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this have the actual downloaded data too if we want gloas to work too?

},
/// Block processing complete
Complete {
#[educe(Debug(ignore))]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth keeping the peer_id in the Complete state for later penalization? is this because we are assuming block peer is always right if the processing passed?

self.data_request = DataRequest::Complete;
self.continue_requests(cx)
} else {
// Data processing failed — retry data download only
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we get stuck because of not incrementing failed_processing here?

self.payload_request = PayloadRequest::Complete;
self.continue_requests(cx)
} else {
self.payload_request = PayloadRequest::Downloading {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above


// Peer sets.
//
// `Arc<RwLock<..>>` is used so the same peer set can be shared between the lookup and
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only true for columns right? so the data_peers and payload peers are similar just for consistency?

@dapplion dapplion force-pushed the gloas-lookup-sync-fixes branch 3 times, most recently from eb40ff0 to 3ff8491 Compare April 22, 2026 06:56
Rewrites the single block lookup state machine for Gloas, where block, data
(blobs/columns), and execution payload envelope are independent components
that can arrive and import out of order.

- Three additive-only sub-state-machines for block / data / payload streams.
  Peer sets start empty for data/payload and grow as children arrive — the
  parent lookup's completion requirement can widen over time without
  mutating any state machine.
- `AwaitingParent` becomes a struct carrying the child's `parent_block_hash`
  so the parent can be classified empty/full from the child's bid reference.
- Wires `PayloadEnvelopesByRoot` RPC end-to-end through `SyncNetworkContext`:
  request sending, response routing (`SingleLookupReqId::SinglePayloadEnvelope`),
  and integration into `PayloadRequest`. Envelope *processing* is still a TODO;
  only the download path is wired.
- Test rig: serves envelopes from a `network_envelopes_by_root` cache
  populated from the external harness; bumps test validator count to 8 so
  `proposer_lookahead` can populate at the Fulu → Gloas upgrade.
- Enables gloas in `TEST_NETWORK_FORKS`.
- Fixes: genesis parent check, infinite retry loop on repeated download
  failure, no-op in `on_completed_request`, and peer sets not being cleared
  on disconnect.
@dapplion dapplion force-pushed the gloas-lookup-sync-fixes branch from 3ff8491 to ebe9fe2 Compare April 22, 2026 07:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants