Skip to content

op-supernode: shared interop start from safe head#20392

Merged
jelias2 merged 9 commits intodevelopfrom
feature/supernode-safe-start
Apr 28, 2026
Merged

op-supernode: shared interop start from safe head#20392
jelias2 merged 9 commits intodevelopfrom
feature/supernode-safe-start

Conversation

@axelKingsley
Copy link
Copy Markdown
Contributor

@axelKingsley axelKingsley commented Apr 28, 2026

Tool Use Notice

Generated. In self-review.

Summary

This PR unifies interop startup around the firstVerifiableTimestamp path. Both backfill and non-backfill startup now consult the same safe-head-aware readiness gate before verification begins.

The shared rule is:

  • If backfill ran, first verifiable is backfillEndTimestamp + 1.
  • If backfill did not run and cross-safe is before activation, first verifiable is activationTimestamp.
  • If backfill did not run and cross-safe is at/after activation, first verifiable is minCrossSafeTime + 1.
  • If sync status is unavailable or local-safe is zero, startup waits/retries.
  • Startup emits an INFO log with the resolved first-verifiable timestamp.

Core Functional Changes

Area Change Code
Shared first-verifiable path firstVerifiableTimestamp(ctx) now returns (uint64, error) and centralizes backfill handoff, cached startup handoff, and safe-head resolution. op-supernode/supernode/activity/interop/log_backfill.go:13-32
Safe-head readiness gate resolveFirstVerifiableTimestamp(ctx) reads every chain SyncStatus, requires non-zero LocalSafeL2, clamps pre-activation cross-safe to activation, and otherwise returns minCrossSafeTime + 1. op-supernode/supernode/activity/interop/log_backfill.go:35-60
Backfill parity Backfill calls the same resolver, no-ops if first-verifiable is activation, and otherwise derives its inclusive backfill end as firstVerifiable - 1. op-supernode/supernode/activity/interop/log_backfill.go:70-77
No-backfill startup When there is no prior verified state and no backfill, startup resolves and caches the first-verifiable timestamp before entering verification. op-supernode/supernode/activity/interop/interop.go:277-292
Startup visibility Startup always logs the resolved firstVerifiableTimestamp at INFO. op-supernode/supernode/activity/interop/interop.go:293-302
Main loop handoff observeRound asks the shared path for the next timestamp when there is no existing verified state. op-supernode/supernode/activity/interop/interop.go:455-470
Empty logs DB validation The first non-backfill log seal is accepted only at the shared first-verifiable timestamp. op-supernode/supernode/activity/interop/logdb.go:172-180

Spec And Test Coverage

Spec Claim Test Coverage Lines
firstVerifiableTimestamp is the common startup readiness gate. TestFirstVerifiableTimestamp directly exercises the shared API. op-supernode/supernode/activity/interop/interop_test.go:191-306
Sync status errors block startup. Test case "sync status error blocks startup" returns a sync status error and expects wantErr. op-supernode/supernode/activity/interop/interop_test.go:215-227
Zero local-safe blocks startup. Test case "zero local-safe blocks startup" sets empty LocalSafeL2 and expects wantErr. op-supernode/supernode/activity/interop/interop_test.go:228-240
Cross-safe before activation returns activation. Test case "cross-safe before activation returns activation" sets SafeL2.Time=99, activation 100, and expects 100. op-supernode/supernode/activity/interop/interop_test.go:242-254
No chains returns activation. Test case "no chains returns activation" expects 100. op-supernode/supernode/activity/interop/interop_test.go:208-213
Cross-safe at activation returns first unverified timestamp. Test case "cross-safe at activation returns timestamp after activation" expects 101. op-supernode/supernode/activity/interop/interop_test.go:256-268
Multi-chain startup uses one past the minimum cross-safe timestamp. Test case with safe times 140 and 125 expects 126. op-supernode/supernode/activity/interop/interop_test.go:270-288
No-backfill startup uses the same first-verifiable timestamp for verification. TestStartWithoutBackfillUsesFirstVerifiableTimestamp captures verifyFn timestamp and asserts safe+1. op-supernode/supernode/activity/interop/interop_test.go:308-345
Startup logs the resolved first-verifiable timestamp. Same test captures logs and asserts the INFO line includes firstVerifiableTimestamp=126. op-supernode/supernode/activity/interop/interop_test.go:346-350
Backfill no-ops when activation is ahead of cross-safe. TestLogBackfill_ActivationInFuture expects runLogBackfill to return end=0, fetch no blocks, leave the logs DB empty, and keep first-verifiable at activation. op-supernode/supernode/activity/interop/log_backfill_test.go:507-554

Test Plan

  • go test ./supernode/activity/interop
  • go test ./supernode/activity/...

Use the minimum cross-safe head as the shared readiness gate so interop startup begins verification after already-safe data, with backfill using the same boundary for its inclusive range end.

Made-with: Cursor
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 66.3%. Comparing base (79a83f3) to head (8ed4b80).
⚠️ Report is 3 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop   #20392      +/-   ##
===========================================
+ Coverage     11.6%    66.3%   +54.7%     
===========================================
  Files          673       55     -618     
  Lines        73151     4035   -69116     
===========================================
- Hits          8487     2677    -5810     
+ Misses       64520     1214   -63306     
  Partials       144      144              
Flag Coverage Δ
cannon-go-tests-64 66.3% <ø> (ø)
contracts-bedrock-tests ?
unit ?

Flags with carried forward coverage won't be shown. Click here to find out more.
see 618 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Keep the shared first-verifiable calculation anchored at activation when cross-safe is still pre-activation, so backfill no-ops instead of treating the chain as unready.

Made-with: Cursor
Allow timestamps before the resolved first-verifiable boundary to satisfy verifier checks so super-root queries can resolve data already covered by startup safe-head handoff.

Made-with: Cursor
Resolve and log the first verifiable timestamp before running log backfill so startup has a single readiness gate and backfill reuses the recorded handoff point.

Made-with: Cursor
Keep the startup test focused on first-verifiable behavior without asserting informational log output.

Made-with: Cursor
Keep the main-loop startup boundary helper with the interop activity instead of the log backfill implementation.

Made-with: Cursor
@axelKingsley axelKingsley marked this pull request as ready for review April 28, 2026 18:33
@axelKingsley axelKingsley requested a review from a team as a code owner April 28, 2026 18:33
@karlfloersch
Copy link
Copy Markdown
Contributor

I think this has a restart-safety bug around the new safe-head handoff boundary.

Concrete scenario:

  1. activationTimestamp = 100
  2. On first startup, every chain's cross-safe head is already at timestamp 125
  3. The PR correctly resolves firstVerifiableTimestamp = 126, so timestamps 100..125 are treated as already covered by the safe-head handoff
  4. The main loop commits the first verified result at 126
  5. The process restarts

After restart, verifiedDB remembers the committed result at 126, but the safe-head handoff boundary was only stored in memory. In firstVerifiableTimestamp, this path currently does:

if _, initialized := i.verifiedDB.LastTimestamp(); initialized {
    return i.activationTimestamp, nil
}

That reconstructs the first-verifiable boundary as 100 instead of 126. From that point on, VerifiedAtTimestamp(125) checks verifiedDB.Has(125) and returns false, even though 125 was intentionally skipped because it was covered by the startup safe head. Similarly, rewind planning at the first committed timestamp can try to read verifiedDB.Get(125), but that timestamp was never written.

I think the durable source of truth should be the first entry in verifiedDB, not activation, once the DB has been initialized. If the first committed verified timestamp is 126, then timestamps <126 and >= activationTimestamp are the safe-head/backfill handoff range. If the old behavior starts at activation, the first committed entry is activationTimestamp, so this remains compatible.

Suggested shape:

  1. Extend VerifiedDB to track both first and last committed timestamps.
    • During DB init, use the bbolt cursor's First() key for firstTimestamp and Last() key for lastTimestamp.
    • Add a FirstTimestamp() (uint64, bool) accessor.
  2. Change the initialized-DB path in firstVerifiableTimestamp to use the first committed timestamp:
if first, initialized := i.verifiedDB.FirstTimestamp(); initialized {
    return first, nil
}
  1. Add a regression test that commits the first verified result at safeHead+1, reconstructs/reopens interop from the same DB, and asserts:
    • firstVerifiableTimestamp() still returns safeHead+1
    • VerifiedAtTimestamp(safeHead) returns true
    • rewind planning at the first committed timestamp does not try to read the skipped safeHead timestamp from verifiedDB

In short: starting from the first entry in verifiedDB is how this should work after restart. If the current code falls back to activation when the DB is initialized, that loses the handoff boundary and is the bug.

Reconstruct the first verifiable timestamp from the first committed verifiedDB entry so safe-head handoff ranges remain verified after restart.

Made-with: Cursor
@axelKingsley
Copy link
Copy Markdown
Contributor Author

@karlfloersch

Fixed in 987cc37b0c.

The restart path now reconstructs the handoff boundary from durable state instead of falling back to activation. VerifiedDB tracks the first committed timestamp as well as the last, and firstVerifiableTimestamp() uses that first committed timestamp after restart. So in the activation=100, safe=125, first committed 126 case, timestamps <126 remain treated as covered by the safe-head handoff after restart.

I also added regression coverage for the exact failure mode:

  • VerifiedAtTimestamp(125) remains true after restart.
  • buildRewindPlan(126) stops at the first verifiable timestamp instead of trying to read missing 125.
  • VerifiedDB restores first/last timestamp bounds across reopen.

Keep the persisted first verified timestamp ahead of any restart backfill boundary so backfill cannot widen the implicit safe-head handoff range.

Made-with: Cursor
@axelKingsley axelKingsley enabled auto-merge April 28, 2026 19:23
@axelKingsley axelKingsley disabled auto-merge April 28, 2026 20:49
Use the post-backfill handoff timestamp when checking log coverage so restart-safe first verifier bounds are not confused with the rebuilt logs range.

Made-with: Cursor
@axelKingsley axelKingsley enabled auto-merge April 28, 2026 21:16
@axelKingsley axelKingsley added this pull request to the merge queue Apr 28, 2026
@github-merge-queue github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 28, 2026
@jelias2 jelias2 added this pull request to the merge queue Apr 28, 2026
Merged via the queue into develop with commit 2601ae9 Apr 28, 2026
72 checks passed
@jelias2 jelias2 deleted the feature/supernode-safe-start branch April 28, 2026 22:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants