Skip to content

feat(beacon-node): run consensus-specs fork-choice compliance tests#9290

Open
parithosh wants to merge 3 commits intoChainSafe:unstablefrom
parithosh:compliance-fork-choice-tests
Open

feat(beacon-node): run consensus-specs fork-choice compliance tests#9290
parithosh wants to merge 3 commits intoChainSafe:unstablefrom
parithosh:compliance-fork-choice-tests

Conversation

@parithosh
Copy link
Copy Markdown

Summary

  • Adds a compliance_fork_choice spec test runner that exercises the consensus-specs Fork Choice Compliance suite against Lodestar's fork-choice driver, alongside the existing fork_choice runner.
  • Extracts the forkChoiceTest factory into test/spec/utils/forkChoiceRunner.ts so both runners share it; fork_choice.test.ts is now a thin entrypoint.
  • Two test-only accommodations for compliance fixture conventions (no production code changed).
  • Out-of-band tooling to download the artifact and produce a Prysm-style per-suite report.

Why a separate runner

The compliance generator (consensus-specs/tests/generators/compliance_runners/fork_choice) ships independently of the standard spec-test bundle. Its scheduled "Compliance Tests" workflow publishes a small.tar.gz artifact (~80 MB), produced from consensus-specs master. The on-disk layout matches the standard spec-test layout (tests/<preset>/<fork>/fork_choice_compliance/<handler>/<suite>/<case>/), so the new runner reuses specTestIterator with the runner name registered as fork_choice_compliance and the test ID prefixed by config (small/tiny/standard) so multiple configs can coexist.

Test-only accommodations in the runner

  1. bls_setting: 2 — every compliance fixture uses placeholder signatures. The runner now passes validSignatures: testcase.meta?.bls_setting === BigInt(2) to chain.processBlock, which short-circuits verifyBlocksSignatures. Standard fork_choice fixtures use bls_setting: 1 so behavior there is unchanged.
  2. BLOCK_ERROR_ALREADY_KNOWN — compliance fixtures intentionally re-import the same block (see dup_shift mutations in their meta.yaml). Spec semantics for on_block(store, block) on a duplicate is a no-op success. The runner now treats this BlockError as success when the step has valid: true. Production block import is untouched (rejecting duplicates is correct outside the spec test loop).

Tooling

  • scripts/download-compliance-fc-tests.sh — Prysm-style resolver supporting --dir, --tarball, --url, --run-id, and auto-fetch of the latest successful workflow run via gh. Magic-byte sniffing handles both zip-wrapped and raw tar.gz artifacts (the workflow currently publishes the latter, which gh run download can't unwrap on its own).
  • scripts/compliance-fc-report.sh — runs the suite under vitest with --reporter=json and prints a per-suite total/pass/fail/skip table plus an error-grouped failure breakdown. Bumps the worker heap to 8 GB so 2946 sequential BeaconChain instances don't OOM.

Current pass rate

Against the published small.tar.gz:

Total:   2946
Passed:  506   (~17%)
Failed:  2438
Skipped: 2

Per-fork: fulu 21%, gloas 13% — both above Prysm's reported ~12% baseline. Both block_cover suites pass fully (192/192 each fork).

Known follow-ups (not in this PR)

Top remaining failure classes from the report's grouped output:

Count Class Notes
1280 SSZ: First offset must equal to fixedEnd 80 != 48 Every gloas suite except block_cover. The compliance artifact is built from consensus-specs master, while Lodestar's spec-tests-version.json pin is v1.7.0-alpha.5. A Gloas container shape on master is ahead of our @lodestar/types definitions (suspected: SignedExecutionPayloadEnvelope or an adjacent payload-attestation container). Fix belongs in a separate PR updating types.
~600 Invalid proposer boost root at step N Real fork-choice compliance gaps in fulu suites — multiple step counts, but probably a small number of underlying bugs.
145 EPOCH_CONTEXT_ERROR_COMMITTEE_EPOCH_OUT_OF_RANGE All in block_weight_test. Cache/lookup-ahead range.
~80 FORKCHOICE_ERROR_INVALID_ATTESTATION Mostly in attester_slashing_test.

CI

Intentionally not wired. The runner cleanly skips when packages/beacon-node/spec-tests-compliance/<config>/ is absent, so existing test:spec:* jobs are unaffected. Devs and reviewers run it on demand:

./scripts/compliance-fc-report.sh                            # auto-fetch small via gh, run minimal preset, print summary
./scripts/compliance-fc-report.sh --tarball ~/Downloads/small.tar.gz
./scripts/compliance-fc-report.sh --no-download              # data already extracted

Test plan

  • pnpm check-types clean
  • pnpm lint clean
  • pnpm vitest run --project spec-minimal test/spec/presets/compliance_fork_choice.test.ts cleanly skips with no data extracted (3 skip rows, one per config, with download-command messages)
  • Full report run after download-compliance-fc-tests.sh completes without OOM (pending: 0) and produces 506/2944 = 17.2% pass rate
  • Existing fork_choice runner unchanged in behavior (pure factory extraction; only diff vs. unstable is the relocated factory + runtime conflict resolution from fix: correct DA status for payload #9278's DataAvailabilityStatus parameter)

AI Assistance Disclosure

Used Claude Code.

Adds a new `compliance_fork_choice` spec test runner alongside the
existing `fork_choice` runner. Tests are downloaded out-of-band from
the consensus-specs "Compliance Tests" workflow artifact and exercised
via the same fork-choice driver, with two test-only accommodations for
fixture conventions:

- Honor `meta.bls_setting == 2` (placeholder signatures) by passing
  `validSignatures: true` to `chain.processBlock`.
- Treat `BLOCK_ERROR_ALREADY_KNOWN` as a no-op success when a step
  expects validity, matching spec semantics for `on_block(store, block)`
  on a duplicate (compliance fixtures intentionally re-import blocks
  via `dup_shift` mutations).

The existing `forkChoiceTest` factory was extracted from
`fork_choice.test.ts` into `test/spec/utils/forkChoiceRunner.ts` so
both runners share it; the original test file is now a thin entrypoint.

Tooling:
- `scripts/download-compliance-fc-tests.sh` resolves the artifact via
  `--dir`, `--tarball`, `--url`, `--run-id`, or auto-fetches the latest
  successful workflow run via `gh`. Mirrors Prysm's
  `hack/compliance-fc-report.sh` pattern. Magic-byte sniffing handles
  both zip-wrapped and raw `tar.gz` artifacts (the consensus-specs
  workflow currently publishes the latter).
- `scripts/compliance-fc-report.sh` runs the suite under vitest with
  `--reporter=json` and prints a per-suite total/pass/fail/skip table
  plus an error-grouped failure breakdown.

Current pass rate against `small.tar.gz`: 506 / 2944 (~17%), ahead of
the comparable Prysm baseline (~12%). Both `block_cover` suites pass
fully (192/192 each fork). Remaining failure classes are tracked as
follow-ups:

- 1280x SSZ deserialize mismatch on every gloas suite except
  block_cover ("First offset must equal to fixedEnd 80 != 48"). The
  artifact was generated from consensus-specs master while Lodestar's
  spec-tests pin is `v1.7.0-alpha.5`; a Gloas container shape on
  master is ahead of our `@lodestar/types` definitions. Fix belongs
  in a separate PR updating types.
- ~600x `Invalid proposer boost root at step N` — real fork-choice
  compliance gaps in fulu suites.
- 145x `EPOCH_CONTEXT_ERROR_COMMITTEE_EPOCH_OUT_OF_RANGE`.
- ~80x `FORKCHOICE_ERROR_INVALID_ATTESTATION`.

CI is intentionally not wired; the runner cleanly skips when the data
directory is absent so the existing `test:spec:*` jobs are unaffected.

AI Assistance Disclosure: Used Claude Code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@parithosh parithosh requested a review from a team as a code owner April 27, 2026 15:32
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a compliance fork-choice test suite for the beacon node. It includes new scripts for downloading test artifacts from the consensus-specs repository and generating detailed execution reports. The core fork-choice test logic has been refactored into a shared utility, forkChoiceRunner.ts, which is now used by both standard and compliance tests. Feedback suggests optimizing performance by reusing key pairs across test cases and correcting a potential typo in the database name configuration within the shared runner.

proposerBoostReorg: true,
},
{
privateKey: await generateKeyPair("secp256k1"),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Generating a new secp256k1 key pair for every test case is computationally expensive, especially when running the full compliance suite of nearly 3,000 tests. Since the specific key does not matter for these spec tests, consider pre-generating a single key pair once outside the testFunction and reusing it across all test cases to improve performance.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 2b319fd — hoisted to module scope via top-level await with a comment explaining why the key is reused.

dbName: ",",
logger,
processShutdownCallback: () => {},
clock,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

dbName: "," appears to be a typo, likely intended to be an empty string "" or a standard mock identifier like ":memory:". While this was present in the original test file, correcting it now that the logic has been moved to a shared utility would improve code clarity.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 2b319fd — changed to "" since the value is unused with getMockedBeaconDb().

parithosh and others added 2 commits April 28, 2026 10:43
Address review feedback on ChainSafe#9290:

- Generate the libp2p secp256k1 keypair once at module load instead of
  per test case. The key is only used to derive a PeerId for the
  BeaconChain instance and is never validated in spec tests, so reuse
  is safe and saves ~3000 keygens on a full compliance run.
- Replace stray `dbName: ","` (predates this PR) with `""`, since the
  value is functionally unused with the mocked `getMockedBeaconDb()`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant