feat(beacon-node): run consensus-specs fork-choice compliance tests#9290
feat(beacon-node): run consensus-specs fork-choice compliance tests#9290parithosh wants to merge 3 commits intoChainSafe:unstablefrom
Conversation
Adds a new `compliance_fork_choice` spec test runner alongside the
existing `fork_choice` runner. Tests are downloaded out-of-band from
the consensus-specs "Compliance Tests" workflow artifact and exercised
via the same fork-choice driver, with two test-only accommodations for
fixture conventions:
- Honor `meta.bls_setting == 2` (placeholder signatures) by passing
`validSignatures: true` to `chain.processBlock`.
- Treat `BLOCK_ERROR_ALREADY_KNOWN` as a no-op success when a step
expects validity, matching spec semantics for `on_block(store, block)`
on a duplicate (compliance fixtures intentionally re-import blocks
via `dup_shift` mutations).
The existing `forkChoiceTest` factory was extracted from
`fork_choice.test.ts` into `test/spec/utils/forkChoiceRunner.ts` so
both runners share it; the original test file is now a thin entrypoint.
Tooling:
- `scripts/download-compliance-fc-tests.sh` resolves the artifact via
`--dir`, `--tarball`, `--url`, `--run-id`, or auto-fetches the latest
successful workflow run via `gh`. Mirrors Prysm's
`hack/compliance-fc-report.sh` pattern. Magic-byte sniffing handles
both zip-wrapped and raw `tar.gz` artifacts (the consensus-specs
workflow currently publishes the latter).
- `scripts/compliance-fc-report.sh` runs the suite under vitest with
`--reporter=json` and prints a per-suite total/pass/fail/skip table
plus an error-grouped failure breakdown.
Current pass rate against `small.tar.gz`: 506 / 2944 (~17%), ahead of
the comparable Prysm baseline (~12%). Both `block_cover` suites pass
fully (192/192 each fork). Remaining failure classes are tracked as
follow-ups:
- 1280x SSZ deserialize mismatch on every gloas suite except
block_cover ("First offset must equal to fixedEnd 80 != 48"). The
artifact was generated from consensus-specs master while Lodestar's
spec-tests pin is `v1.7.0-alpha.5`; a Gloas container shape on
master is ahead of our `@lodestar/types` definitions. Fix belongs
in a separate PR updating types.
- ~600x `Invalid proposer boost root at step N` — real fork-choice
compliance gaps in fulu suites.
- 145x `EPOCH_CONTEXT_ERROR_COMMITTEE_EPOCH_OUT_OF_RANGE`.
- ~80x `FORKCHOICE_ERROR_INVALID_ATTESTATION`.
CI is intentionally not wired; the runner cleanly skips when the data
directory is absent so the existing `test:spec:*` jobs are unaffected.
AI Assistance Disclosure: Used Claude Code.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a compliance fork-choice test suite for the beacon node. It includes new scripts for downloading test artifacts from the consensus-specs repository and generating detailed execution reports. The core fork-choice test logic has been refactored into a shared utility, forkChoiceRunner.ts, which is now used by both standard and compliance tests. Feedback suggests optimizing performance by reusing key pairs across test cases and correcting a potential typo in the database name configuration within the shared runner.
| proposerBoostReorg: true, | ||
| }, | ||
| { | ||
| privateKey: await generateKeyPair("secp256k1"), |
There was a problem hiding this comment.
Generating a new secp256k1 key pair for every test case is computationally expensive, especially when running the full compliance suite of nearly 3,000 tests. Since the specific key does not matter for these spec tests, consider pre-generating a single key pair once outside the testFunction and reusing it across all test cases to improve performance.
There was a problem hiding this comment.
Done in 2b319fd — hoisted to module scope via top-level await with a comment explaining why the key is reused.
| dbName: ",", | ||
| logger, | ||
| processShutdownCallback: () => {}, | ||
| clock, |
There was a problem hiding this comment.
There was a problem hiding this comment.
Done in 2b319fd — changed to "" since the value is unused with getMockedBeaconDb().
Address review feedback on ChainSafe#9290: - Generate the libp2p secp256k1 keypair once at module load instead of per test case. The key is only used to derive a PeerId for the BeaconChain instance and is never validated in spec tests, so reuse is safe and saves ~3000 keygens on a full compliance run. - Replace stray `dbName: ","` (predates this PR) with `""`, since the value is functionally unused with the mocked `getMockedBeaconDb()`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
compliance_fork_choicespec test runner that exercises the consensus-specs Fork Choice Compliance suite against Lodestar's fork-choice driver, alongside the existingfork_choicerunner.forkChoiceTestfactory intotest/spec/utils/forkChoiceRunner.tsso both runners share it;fork_choice.test.tsis now a thin entrypoint.Why a separate runner
The compliance generator (
consensus-specs/tests/generators/compliance_runners/fork_choice) ships independently of the standard spec-test bundle. Its scheduled "Compliance Tests" workflow publishes asmall.tar.gzartifact (~80 MB), produced fromconsensus-specsmaster. The on-disk layout matches the standard spec-test layout (tests/<preset>/<fork>/fork_choice_compliance/<handler>/<suite>/<case>/), so the new runner reusesspecTestIteratorwith the runner name registered asfork_choice_complianceand the test ID prefixed by config (small/tiny/standard) so multiple configs can coexist.Test-only accommodations in the runner
bls_setting: 2— every compliance fixture uses placeholder signatures. The runner now passesvalidSignatures: testcase.meta?.bls_setting === BigInt(2)tochain.processBlock, which short-circuitsverifyBlocksSignatures. Standard fork_choice fixtures usebls_setting: 1so behavior there is unchanged.BLOCK_ERROR_ALREADY_KNOWN— compliance fixtures intentionally re-import the same block (seedup_shiftmutations in theirmeta.yaml). Spec semantics foron_block(store, block)on a duplicate is a no-op success. The runner now treats thisBlockErroras success when the step hasvalid: true. Production block import is untouched (rejecting duplicates is correct outside the spec test loop).Tooling
scripts/download-compliance-fc-tests.sh— Prysm-style resolver supporting--dir,--tarball,--url,--run-id, and auto-fetch of the latest successful workflow run viagh. Magic-byte sniffing handles both zip-wrapped and rawtar.gzartifacts (the workflow currently publishes the latter, whichgh run downloadcan't unwrap on its own).scripts/compliance-fc-report.sh— runs the suite under vitest with--reporter=jsonand prints a per-suite total/pass/fail/skip table plus an error-grouped failure breakdown. Bumps the worker heap to 8 GB so 2946 sequentialBeaconChaininstances don't OOM.Current pass rate
Against the published
small.tar.gz:Per-fork: fulu 21%, gloas 13% — both above Prysm's reported ~12% baseline. Both
block_coversuites pass fully (192/192 each fork).Known follow-ups (not in this PR)
Top remaining failure classes from the report's grouped output:
SSZ: First offset must equal to fixedEnd 80 != 48block_cover. The compliance artifact is built from consensus-specsmaster, while Lodestar'sspec-tests-version.jsonpin isv1.7.0-alpha.5. A Gloas container shape onmasteris ahead of our@lodestar/typesdefinitions (suspected:SignedExecutionPayloadEnvelopeor an adjacent payload-attestation container). Fix belongs in a separate PR updating types.Invalid proposer boost root at step NEPOCH_CONTEXT_ERROR_COMMITTEE_EPOCH_OUT_OF_RANGEblock_weight_test. Cache/lookup-ahead range.FORKCHOICE_ERROR_INVALID_ATTESTATIONattester_slashing_test.CI
Intentionally not wired. The runner cleanly skips when
packages/beacon-node/spec-tests-compliance/<config>/is absent, so existingtest:spec:*jobs are unaffected. Devs and reviewers run it on demand:Test plan
pnpm check-typescleanpnpm lintcleanpnpm vitest run --project spec-minimal test/spec/presets/compliance_fork_choice.test.tscleanly skips with no data extracted (3 skip rows, one per config, with download-command messages)download-compliance-fc-tests.shcompletes without OOM (pending: 0) and produces 506/2944 = 17.2% pass ratefork_choicerunner unchanged in behavior (pure factory extraction; only diff vs. unstable is the relocated factory + runtime conflict resolution from fix: correct DA status for payload #9278'sDataAvailabilityStatusparameter)AI Assistance Disclosure
Used Claude Code.