Skip to content

fix: anchor stream from_block at creation to close startup polling gap#577

Open
agentotto[bot] wants to merge 16 commits intomainfrom
fix/stream-from-block-anchor
Open

fix: anchor stream from_block at creation to close startup polling gap#577
agentotto[bot] wants to merge 16 commits intomainfrom
fix/stream-from-block-anchor

Conversation

@agentotto
Copy link
Copy Markdown
Contributor

@agentotto agentotto bot commented Mar 18, 2026

Problem

poll_events initialised from_block lazily on the first poll cycle by calling eth_blockNumber() at that moment. In engine.rs, registry_stream() is called before backfill_commitments(), so the stream exists but hasn't fired its first poll when backfill runs.

Gap window

t0  registry_stream() called — stream created, from_block = None
t1  backfill_commitments() fetches logs [0 .. chain-head-at-t1]
t2  backfill completes, mark_ready() called
t3  POLL_INTERVAL expires → first poll fires → eth_blockNumber() = N

Any ChainCommitted event emitted in blocks (t1-head .. N] is never fetched: backfill stopped at t1-head and the poll loop starts querying from N+1.

Why this is fatal

The keccak chain is a hash chain: each commit's keccakChain field must equal keccak256(prev_head || new_commits). Missing even one ChainCommitted means the relay's in-memory local_chain diverges from the on-chain chain. Every subsequent commit will produce the wrong keccakChain and be rejected by satellites, permanently stalling the log.

Fix

Fetch the current chain head in registry_stream() (before backfill hands off) and pass it as anchor_block to poll_events(). The unfold state is initialised with that concrete u64 instead of None, so the first poll queries [anchor_block+1 .. latest].

Backfill and live-poll overlap at anchor_block (both may process that exact block), but commit_chained deduplicates by chain head — no double-counting.

Test

poll_stream_captures_event_emitted_during_backfill_window — uses alloy's MockTransport and tokio's paused time to simulate a ChainCommitted event landing at block 6 during the backfill window (anchor = 5), advances the mock clock past POLL_INTERVAL, and asserts the event is yielded by the stream.

karankurbur and others added 15 commits March 16, 2026 14:01
Remove corrupt commit hash fields from staging.json.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace has_pending() with should_propagate() that compares
  WorldIDSource.LATEST_ROOT vs WorldIDRegistry.getLatestRoot()
- Remove pending_roots (roots propagate via ChainCommitted)
- Remove dead code: reset_chain, with_chain, kind()
- Remove first flag in satellite, use mark_changed() instead
- Flatten satellite None branch into resync_head() helper
- Start live event stream before backfill to avoid missed events
- Backfill only fetches ChainCommitted (skip 3 wasted RPC calls)
- Downgrade per-tick polling log to debug
- Remove unused reqwest/tokio-stream deps
- Remove NothingChanged string matching hack

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix notify race condition: Add AtomicBool flag to prevent satellites from hanging when mark_ready() is called before they start waiting
- Fix partial get_logs failure: Only advance from_block when all filters succeed to prevent permanent event loss
- Fix notify race condition: Add AtomicBool flag to prevent satellites from hanging when mark_ready() is called before they start waiting
- Fix partial get_logs failure: Only advance from_block when all filters succeed to prevent permanent event loss

Applied via @cursor push command
The polling stream in services/relay previously initialised from_block
lazily on the first poll cycle by calling eth_blockNumber at that
moment.  engine.rs calls registry_stream() *before* backfill, so the
stream exists but has not yet fired a poll when backfill_commitments()
runs.

Gap window:
  t0: registry_stream() called — stream created, from_block = None
  t1: backfill_commitments() fetches logs from [0 .. chain-head-at-t1]
  t2: backfill completes, mark_ready() signals satellites
  t3: POLL_INTERVAL expires, first poll fires → eth_blockNumber() = N

Any ChainCommitted event emitted in blocks (t1-chain-head .. N] is
never fetched: backfill stopped at t1-chain-head and the poll loop
starts querying from N+1.  Missing even a single ChainCommitted breaks
keccak-chain continuity — every subsequent commit references the wrong
predecessor hash, causing satellites to reject or stall permanently.

Fix: fetch the current chain head in registry_stream() (before
backfill) and pass it as anchor_block to poll_events().  The unfold
state is now initialised with a concrete u64 rather than None, so the
first poll queries [anchor_block+1 .. latest], guaranteeing gapless
coverage from the moment the stream is created.

Backfill and live-poll now overlap at anchor_block (both may process
that block), but commit_chained deduplicates by chain head so this is
harmless.

Test: poll_stream_captures_event_emitted_during_backfill_window —
simulates a ChainCommitted at block 6 landing during a backfill window
(anchor = 5), advances mock time past POLL_INTERVAL, and asserts the
event is yielded by the stream.
@agentotto agentotto bot requested a review from a team as a code owner March 18, 2026 21:56
- Clarify that backfill covers [0, latest_at_rpc_time] (not strictly
  [0, anchor]) and that the overlap is deduplicated by commit_chained;
  previous wording would have misled a reader into adding an artificial
  to_block(anchor) cap that could re-introduce the gap.
- Add a note that get_block_number failure propagates immediately because
  anchoring safely requires a concrete block number.
- Derive CHAIN_COMMITTED_TOPIC in the test from the generated ABI
  binding instead of a hardcoded literal, so the test breaks visibly if
  the event signature is ever renamed.
@agentotto
Copy link
Copy Markdown
Contributor Author

agentotto bot commented Mar 18, 2026

Code Review Summary

Fix correctness ✅

  1. Gap is closed: anchor_block = get_block_number() is called in registry_stream() before backfill_commitments() runs. Backfill covers [0, latest_at_backfill_rpc] (no explicit to_block); the live poll covers [anchor_block+1, ∞). Together they are gapless.

  2. Overlap is safe: Both backfill and the first poll may process events at anchor_block. commit_chained deduplicates by chain_head, so double-processing is harmless.

  3. anchor_block=0: Safe. The guard latest <= from_block fires, preventing a u64 underflow in from_block + 1.

  4. get_block_number failure: Changed from silent retry to a hard error propagated to engine::run(). Correct trade-off — cannot anchor without a block number.

Test quality ✅

poll_stream_captures_event_emitted_during_backfill_window faithfully reproduces the exact failure mode (anchor=5, block 6 mined during backfill, first poll must capture it). Uses alloy MockTransport + tokio paused time — no network, no flakiness. Topic constant is derived from the ABI binding so it breaks loudly if the event signature changes.

One remaining gap (non-blocking)

The wiring in registry_stream() that threads anchor_block into poll_events() is not directly tested — would need a full WorldChain mock. The unit test covers the critical logic; the wiring is a compiler-checked one-liner.

Base automatically changed from karan-relay-local to main March 25, 2026 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants