feat(beacon-node): run consensus-specs fork-choice compliance tests by parithosh · Pull Request #9290 · ChainSafe/lodestar

parithosh · 2026-04-27T15:32:21Z

Summary

Adds a compliance_fork_choice spec test runner that exercises the consensus-specs Fork Choice Compliance suite against Lodestar's fork-choice driver, alongside the existing fork_choice runner.
Extracts the forkChoiceTest factory into test/spec/utils/forkChoiceRunner.ts so both runners share it; fork_choice.test.ts is now a thin entrypoint.
Two test-only accommodations for compliance fixture conventions (no production code changed).
Out-of-band tooling to download the artifact and produce a Prysm-style per-suite report.

Why a separate runner

The compliance generator (consensus-specs/tests/generators/compliance_runners/fork_choice) ships independently of the standard spec-test bundle. Its scheduled "Compliance Tests" workflow publishes a small.tar.gz artifact (~80 MB), produced from consensus-specs master. The on-disk layout matches the standard spec-test layout (tests/<preset>/<fork>/fork_choice_compliance/<handler>/<suite>/<case>/), so the new runner reuses specTestIterator with the runner name registered as fork_choice_compliance and the test ID prefixed by config (small/tiny/standard) so multiple configs can coexist.

Test-only accommodations in the runner

bls_setting: 2 — every compliance fixture uses placeholder signatures. The runner now passes validSignatures: testcase.meta?.bls_setting === BigInt(2) to chain.processBlock, which short-circuits verifyBlocksSignatures. Standard fork_choice fixtures use bls_setting: 1 so behavior there is unchanged.
BLOCK_ERROR_ALREADY_KNOWN — compliance fixtures intentionally re-import the same block (see dup_shift mutations in their meta.yaml). Spec semantics for on_block(store, block) on a duplicate is a no-op success. The runner now treats this BlockError as success when the step has valid: true. Production block import is untouched (rejecting duplicates is correct outside the spec test loop).

Tooling

scripts/download-compliance-fc-tests.sh — Prysm-style resolver supporting --dir, --tarball, --url, --run-id, and auto-fetch of the latest successful workflow run via gh. Magic-byte sniffing handles both zip-wrapped and raw tar.gz artifacts (the workflow currently publishes the latter, which gh run download can't unwrap on its own).
scripts/compliance-fc-report.sh — runs the suite under vitest with --reporter=json and prints a per-suite total/pass/fail/skip table plus an error-grouped failure breakdown. Bumps the worker heap to 8 GB so 2946 sequential BeaconChain instances don't OOM.

Current pass rate

Against the published small.tar.gz:

Total:   2946
Passed:  506   (~17%)
Failed:  2438
Skipped: 2

Per-fork: fulu 21%, gloas 13% — both above Prysm's reported ~12% baseline. Both block_cover suites pass fully (192/192 each fork).

Known follow-ups (not in this PR)

Top remaining failure classes from the report's grouped output:

Count	Class	Notes
1280	`SSZ: First offset must equal to fixedEnd 80 != 48`	Every gloas suite except `block_cover`. The compliance artifact is built from consensus-specs `master`, while Lodestar's `spec-tests-version.json` pin is `v1.7.0-alpha.5`. A Gloas container shape on `master` is ahead of our `@lodestar/types` definitions (suspected: `SignedExecutionPayloadEnvelope` or an adjacent payload-attestation container). Fix belongs in a separate PR updating types.
~600	`Invalid proposer boost root at step N`	Real fork-choice compliance gaps in fulu suites — multiple step counts, but probably a small number of underlying bugs.
145	`EPOCH_CONTEXT_ERROR_COMMITTEE_EPOCH_OUT_OF_RANGE`	All in `block_weight_test`. Cache/lookup-ahead range.
~80	`FORKCHOICE_ERROR_INVALID_ATTESTATION`	Mostly in `attester_slashing_test`.

CI

Intentionally not wired. The runner cleanly skips when packages/beacon-node/spec-tests-compliance/<config>/ is absent, so existing test:spec:* jobs are unaffected. Devs and reviewers run it on demand:

./scripts/compliance-fc-report.sh                            # auto-fetch small via gh, run minimal preset, print summary
./scripts/compliance-fc-report.sh --tarball ~/Downloads/small.tar.gz
./scripts/compliance-fc-report.sh --no-download              # data already extracted

Test plan

pnpm check-types clean
pnpm lint clean
pnpm vitest run --project spec-minimal test/spec/presets/compliance_fork_choice.test.ts cleanly skips with no data extracted (3 skip rows, one per config, with download-command messages)
Full report run after download-compliance-fc-tests.sh completes without OOM (pending: 0) and produces 506/2944 = 17.2% pass rate
Existing fork_choice runner unchanged in behavior (pure factory extraction; only diff vs. unstable is the relocated factory + runtime conflict resolution from fix: correct DA status for payload #9278's DataAvailabilityStatus parameter)

AI Assistance Disclosure

Used Claude Code.

Adds a new `compliance_fork_choice` spec test runner alongside the existing `fork_choice` runner. Tests are downloaded out-of-band from the consensus-specs "Compliance Tests" workflow artifact and exercised via the same fork-choice driver, with two test-only accommodations for fixture conventions: - Honor `meta.bls_setting == 2` (placeholder signatures) by passing `validSignatures: true` to `chain.processBlock`. - Treat `BLOCK_ERROR_ALREADY_KNOWN` as a no-op success when a step expects validity, matching spec semantics for `on_block(store, block)` on a duplicate (compliance fixtures intentionally re-import blocks via `dup_shift` mutations). The existing `forkChoiceTest` factory was extracted from `fork_choice.test.ts` into `test/spec/utils/forkChoiceRunner.ts` so both runners share it; the original test file is now a thin entrypoint. Tooling: - `scripts/download-compliance-fc-tests.sh` resolves the artifact via `--dir`, `--tarball`, `--url`, `--run-id`, or auto-fetches the latest successful workflow run via `gh`. Mirrors Prysm's `hack/compliance-fc-report.sh` pattern. Magic-byte sniffing handles both zip-wrapped and raw `tar.gz` artifacts (the consensus-specs workflow currently publishes the latter). - `scripts/compliance-fc-report.sh` runs the suite under vitest with `--reporter=json` and prints a per-suite total/pass/fail/skip table plus an error-grouped failure breakdown. Current pass rate against `small.tar.gz`: 506 / 2944 (~17%), ahead of the comparable Prysm baseline (~12%). Both `block_cover` suites pass fully (192/192 each fork). Remaining failure classes are tracked as follow-ups: - 1280x SSZ deserialize mismatch on every gloas suite except block_cover ("First offset must equal to fixedEnd 80 != 48"). The artifact was generated from consensus-specs master while Lodestar's spec-tests pin is `v1.7.0-alpha.5`; a Gloas container shape on master is ahead of our `@lodestar/types` definitions. Fix belongs in a separate PR updating types. - ~600x `Invalid proposer boost root at step N` — real fork-choice compliance gaps in fulu suites. - 145x `EPOCH_CONTEXT_ERROR_COMMITTEE_EPOCH_OUT_OF_RANGE`. - ~80x `FORKCHOICE_ERROR_INVALID_ATTESTATION`. CI is intentionally not wired; the runner cleanly skips when the data directory is absent so the existing `test:spec:*` jobs are unaffected. AI Assistance Disclosure: Used Claude Code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request introduces a compliance fork-choice test suite for the beacon node. It includes new scripts for downloading test artifacts from the consensus-specs repository and generating detailed execution reports. The core fork-choice test logic has been refactored into a shared utility, forkChoiceRunner.ts, which is now used by both standard and compliance tests. Feedback suggests optimizing performance by reusing key pairs across test cases and correcting a potential typo in the database name configuration within the shared runner.

gemini-code-assist · 2026-04-27T15:37:48Z

+            proposerBoostReorg: true,
+          },
+          {
+            privateKey: await generateKeyPair("secp256k1"),


Generating a new secp256k1 key pair for every test case is computationally expensive, especially when running the full compliance suite of nearly 3,000 tests. Since the specific key does not matter for these spec tests, consider pre-generating a single key pair once outside the testFunction and reusing it across all test cases to improve performance.

Done in 2b319fd — hoisted to module scope via top-level await with a comment explaining why the key is reused.

gemini-code-assist · 2026-04-27T15:37:48Z

+            dbName: ",",
+            logger,
+            processShutdownCallback: () => {},
+            clock,


dbName: "," appears to be a typo, likely intended to be an empty string "" or a standard mock identifier like ":memory:". While this was present in the original test file, correcting it now that the logic has been moved to a shared utility would improve code clarity.

Done in 2b319fd — changed to "" since the value is unused with getMockedBeaconDb().

Address review feedback on ChainSafe#9290: - Generate the libp2p secp256k1 keypair once at module load instead of per test case. The key is only used to derive a PeerId for the BeaconChain instance and is never validated in spec tests, so reuse is safe and saves ~3000 keygens on a full compliance run. - Replace stray `dbName: ","` (predates this PR) with `""`, since the value is functionally unused with the mocked `getMockedBeaconDb()`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

parithosh requested a review from a team as a code owner April 27, 2026 15:32

gemini-code-assist Bot reviewed Apr 27, 2026

View reviewed changes

parithosh and others added 2 commits April 28, 2026 10:43

Merge branch 'unstable' into compliance-fork-choice-tests

ba6987f

parithosh mentioned this pull request Apr 28, 2026

compliance fork-choice tests: gloas SSZ types diverge from consensus-specs master #9295

Open

GrapeBaBa mentioned this pull request May 1, 2026

feat: integrate fork-choice compliance spec test suite #9314

Draft

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(beacon-node): run consensus-specs fork-choice compliance tests#9290

feat(beacon-node): run consensus-specs fork-choice compliance tests#9290
parithosh wants to merge 3 commits intoChainSafe:unstablefrom
parithosh:compliance-fork-choice-tests

parithosh commented Apr 27, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Uh oh!

parithosh Apr 28, 2026

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Uh oh!

parithosh Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

parithosh commented Apr 27, 2026

Summary

Why a separate runner

Test-only accommodations in the runner

Tooling

Current pass rate

Known follow-ups (not in this PR)

CI

Test plan

AI Assistance Disclosure

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

parithosh Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

parithosh Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant