Skip to content

ef_tests: wire fork-choice compliance suites#9185

Open
parithosh wants to merge 1 commit intosigp:glamsterdam-devnet-0from
parithosh:parithosh/fc-compliance-tests
Open

ef_tests: wire fork-choice compliance suites#9185
parithosh wants to merge 1 commit intosigp:glamsterdam-devnet-0from
parithosh:parithosh/fc-compliance-tests

Conversation

@parithosh
Copy link
Copy Markdown

@parithosh parithosh commented Apr 27, 2026

Summary

Wire the consensus-specs Compliance Tests fork-choice scenarios into testing/ef_tests so they can be run against this branch with cargo test. Test-only — no production behaviour change.

The 12 compliance test functions are gated behind #[ignore] so they don't run in normal cargo test invocations and don't block CI; they're invoked via scripts/compliance-fc-report.sh (which passes --include-ignored).

What's added

  • ForkChoiceComplianceHandler (runner = fork_choice_compliance, in testing/ef_tests/src/handler.rs). Gated to cfg!(feature = "fake_crypto") because the generator emits placeholder BLS signatures (bls_setting: 2 in meta.yaml); SSZ-decoding them with real BLS yields BLST_BAD_ENCODING before fork-choice runs. Has an only_fork(...) filter so we get per-fork test functions.
  • 12 per-suite × per-fork test fns in testing/ef_tests/tests/tests.rs, each #[test] #[ignore]. Gloas is currently a no-op (is_enabled_for_fork skips it) until the recent payload-envelope DB changes (Update database and block replayer to handle payload envelopes #8886 etc.) settle — gloas anchor states currently fail to initialise the test harness with "Head block not found in store".
  • Step::PayloadAttestation variant + Tester::process_payload_attestation for the new step kind that ships in Gloas cases.
  • Step::{Attestation,AttesterSlashing,PayloadAttestation} learn an optional valid field; valid: false is treated as expected-rejection. Existing test corpora that don't set the field are unaffected (#[serde(default)]).
  • Checks.viable_for_head_roots_and_weights — new check parsed by the runner and validated against a new public ProtoArrayForkChoice::filtered_block_tree_leaves_and_weights<E>(...) that mirrors the spec helper:
    filtered_block_roots = spec.get_filtered_block_tree(store).keys()
    leaves_viable_for_head = [r for r in filtered_block_roots
                              if not any(c for c in filtered_block_roots if store.blocks[c].parent_root == r)]
    i.e. nodes in get_filtered_block_tree with no descendant that is also in the filtered tree, paired with their stored weights. Results are sorted by (root, weight) before comparison so order differences don't matter. find_head is called first to flush pending vote deltas.
  • process_block_and_blobs / process_block_and_columns map BlockError::DuplicateFullyImported to success, since spec on_block is idempotent and the compliance corpus re-feeds blocks repeatedly.
  • Meta relaxed to allow the compliance generator's extra fields (seed, model_params, bls_setting); description is now Option<String>.

Drive-by

  • cargo fmt reordered two use lines in beacon_node/http_api/src/beacon/execution_payload_envelope.rs (pre-existing fmt drift on this branch — unblocks the local pre-commit hook and the check-code CI job).

Helper script

scripts/compliance-fc-report.sh (mirroring Prysm PR #16724). Resolves the corpus from a tarball / URL / extracted dir / GH artifact, stages the fork_choice_compliance/ subtree under the ef_tests crate (the corpus path is hardcoded at compile time via env!("CARGO_MANIFEST_DIR")), runs the 12 cargo tests with --include-ignored, and prints a per-suite pass/fail/skip table.

# Use a manually-downloaded tarball — no token, no gh
scripts/compliance-fc-report.sh --tarball ~/Downloads/small.tar.gz

# Pull through a public mirror via curl — no token
scripts/compliance-fc-report.sh --url https://example.org/small.tar.gz

# Re-use an already-extracted tree
scripts/compliance-fc-report.sh --dir /var/tmp/compliance_fc_root

# Auto-fetch latest run from the consensus-specs Compliance Tests workflow
GITHUB_TOKEN=... scripts/compliance-fc-report.sh

# Run only one suite
scripts/compliance-fc-report.sh --tarball ./small.tar.gz --suite block_tree_test

To bypass the helper:

mkdir -p testing/ef_tests/consensus-spec-tests/tests/minimal
tar -xzf small.tar.gz --strip-components=1 -C testing/ef_tests/consensus-spec-tests/

RUST_MIN_STACK=8388608 cargo test --release --features "ef_tests,fake_crypto" \
  -p ef_tests --test tests fork_choice_compliance_ -- --include-ignored --nocapture

Notes

  • CI is intentionally not wired up — leaving that decision to maintainers.
  • Tests are #[ignore] so they don't pollute normal CI runs; opt-in via the helper script (or --include-ignored).
  • Gloas suites are skipped at handler level pending the harness-init fix; fulu compliance still runs.
  • Targeted at glamsterdam-devnet-0 since the corpus is fulu/gloas-only and this branch already has those forks plumbed in.

Current results on this branch — fulu only

Suite Total Pass Fail
attester_slashing_test 128 44 84
block_cover_test 192 180 12
block_tree_test 512 128 384
block_weight_test 256 48 208
invalid_message_test 128 38 90
shuffling_test 256 102 154

Top failure modes are real consensus-spec deltas, not infrastructure:

  • proposer_boost_root mismatches — at end-of-slot tick boundaries the spec expects the boost root cleared (0x0…) but lighthouse retains the previous slot's value. on_tick_per_slot does reset on current_slot > previous_slot (fork_choice.rs:1437), so the divergence looks like a (re-)apply ordering issue between block import / tick advancement.
  • viable_for_head_roots_and_weights weight mismatches (block_cover_test fulu, 12 cases). Roots match; weights diverge — lighthouse reports 0 where the spec expects ~102_400_000_000. Likely attestation-queue timing: votes for slot N aren't applied to weights until the chain has crossed slot N.

Test plan

  • bash -n scripts/compliance-fc-report.sh syntax OK
  • scripts/compliance-fc-report.sh --help prints full usage
  • cargo fmt --check clean
  • cargo check -p ef_tests --tests --features "ef_tests,fake_crypto" clean (with RUSTFLAGS="-D warnings")
  • cargo check -p proto_array clean
  • cargo clippy -p ef_tests --tests --features "ef_tests,fake_crypto" — no new warnings on touched files
  • cargo test -p ef_tests --test tests fork_choice_compliance_ (no --include-ignored) → all 12 marked as ignored, no failures
  • End-to-end via scripts/compliance-fc-report.sh --dir <extracted> runs all 12 fns and prints the report shown above
  • End-to-end with --suite block_cover_test runs only that suite

🤖 Generated with Claude Code

@cla-assistant
Copy link
Copy Markdown

cla-assistant Bot commented Apr 27, 2026

CLA assistant check
All committers have signed the CLA.

@eserilev eserilev force-pushed the glamsterdam-devnet-0 branch from e1d4b28 to 2f6f9b5 Compare April 27, 2026 22:14
@eserilev eserilev requested a review from jxs as a code owner April 27, 2026 22:14
Add a runner + helper script for the consensus-specs compliance fork-choice
test suites (https://github.com/ethereum/consensus-specs/tree/master/tests/generators/compliance_runners/fork_choice).
Test-only — no production behaviour change.

What's added
- `ForkChoiceComplianceHandler` (runner = `fork_choice_compliance`, gated on
  `feature = "fake_crypto"` because the generator emits placeholder BLS
  signatures with `bls_setting: 2`) plus 12 per-suite × per-fork test fns.
- `Step::PayloadAttestation` variant + `Tester::process_payload_attestation`
  for the new step kind that ships in Gloas cases.
- `Step::Attestation` / `AttesterSlashing` / `PayloadAttestation` learn an
  optional `valid` field; `valid: false` is treated as expected-rejection.
- `Checks.viable_for_head_roots_and_weights` parsed and validated against
  a new `ProtoArrayForkChoice::filtered_block_tree_leaves_and_weights` —
  mirrors the spec helper (filtered block tree leaves with their weights).
- `process_block_and_blobs` / `process_block_and_columns` map
  `BlockError::DuplicateFullyImported` to success, since spec `on_block`
  is idempotent and the compliance corpus re-feeds blocks.
- `Meta` relaxed to allow the compliance generator's extra fields
  (`seed`, `model_params`, `bls_setting`).

Helper script
- `scripts/compliance-fc-report.sh` resolves the corpus from a tarball
  / URL / extracted dir / GH artifact, stages the `fork_choice_compliance/`
  subtree under the ef_tests crate, runs the 12 cargo tests, and prints a
  per-suite pass/fail/skip report. See `--help`.

Run
  scripts/compliance-fc-report.sh --tarball ~/Downloads/small.tar.gz
  scripts/compliance-fc-report.sh --suite block_tree_test
  GITHUB_TOKEN=... scripts/compliance-fc-report.sh

Current pass rate against this branch: 1024/2944 ≈ 35%. Remaining failures
are real fork-choice deltas (proposer_boost_root timing, viable-tree
weight timing) — see the PR description for the breakdown.

CI is intentionally not wired up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants