ef_tests: wire fork-choice compliance suites#9185
Open
parithosh wants to merge 1 commit intosigp:glamsterdam-devnet-0from
Open
ef_tests: wire fork-choice compliance suites#9185parithosh wants to merge 1 commit intosigp:glamsterdam-devnet-0from
parithosh wants to merge 1 commit intosigp:glamsterdam-devnet-0from
Conversation
e1d4b28 to
2f6f9b5
Compare
Add a runner + helper script for the consensus-specs compliance fork-choice test suites (https://github.com/ethereum/consensus-specs/tree/master/tests/generators/compliance_runners/fork_choice). Test-only — no production behaviour change. What's added - `ForkChoiceComplianceHandler` (runner = `fork_choice_compliance`, gated on `feature = "fake_crypto"` because the generator emits placeholder BLS signatures with `bls_setting: 2`) plus 12 per-suite × per-fork test fns. - `Step::PayloadAttestation` variant + `Tester::process_payload_attestation` for the new step kind that ships in Gloas cases. - `Step::Attestation` / `AttesterSlashing` / `PayloadAttestation` learn an optional `valid` field; `valid: false` is treated as expected-rejection. - `Checks.viable_for_head_roots_and_weights` parsed and validated against a new `ProtoArrayForkChoice::filtered_block_tree_leaves_and_weights` — mirrors the spec helper (filtered block tree leaves with their weights). - `process_block_and_blobs` / `process_block_and_columns` map `BlockError::DuplicateFullyImported` to success, since spec `on_block` is idempotent and the compliance corpus re-feeds blocks. - `Meta` relaxed to allow the compliance generator's extra fields (`seed`, `model_params`, `bls_setting`). Helper script - `scripts/compliance-fc-report.sh` resolves the corpus from a tarball / URL / extracted dir / GH artifact, stages the `fork_choice_compliance/` subtree under the ef_tests crate, runs the 12 cargo tests, and prints a per-suite pass/fail/skip report. See `--help`. Run scripts/compliance-fc-report.sh --tarball ~/Downloads/small.tar.gz scripts/compliance-fc-report.sh --suite block_tree_test GITHUB_TOKEN=... scripts/compliance-fc-report.sh Current pass rate against this branch: 1024/2944 ≈ 35%. Remaining failures are real fork-choice deltas (proposer_boost_root timing, viable-tree weight timing) — see the PR description for the breakdown. CI is intentionally not wired up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
69a5470 to
e3407af
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Wire the consensus-specs Compliance Tests fork-choice scenarios into
testing/ef_testsso they can be run against this branch withcargo test. Test-only — no production behaviour change.The 12 compliance test functions are gated behind
#[ignore]so they don't run in normalcargo testinvocations and don't block CI; they're invoked viascripts/compliance-fc-report.sh(which passes--include-ignored).What's added
ForkChoiceComplianceHandler(runner =fork_choice_compliance, intesting/ef_tests/src/handler.rs). Gated tocfg!(feature = "fake_crypto")because the generator emits placeholder BLS signatures (bls_setting: 2inmeta.yaml); SSZ-decoding them with real BLS yieldsBLST_BAD_ENCODINGbefore fork-choice runs. Has anonly_fork(...)filter so we get per-fork test functions.testing/ef_tests/tests/tests.rs, each#[test] #[ignore]. Gloas is currently a no-op (is_enabled_for_forkskips it) until the recent payload-envelope DB changes (Update database and block replayer to handle payload envelopes #8886 etc.) settle — gloas anchor states currently fail to initialise the test harness with"Head block not found in store".Step::PayloadAttestationvariant +Tester::process_payload_attestationfor the new step kind that ships in Gloas cases.Step::{Attestation,AttesterSlashing,PayloadAttestation}learn an optionalvalidfield;valid: falseis treated as expected-rejection. Existing test corpora that don't set the field are unaffected (#[serde(default)]).Checks.viable_for_head_roots_and_weights— new check parsed by the runner and validated against a new publicProtoArrayForkChoice::filtered_block_tree_leaves_and_weights<E>(...)that mirrors the spec helper:get_filtered_block_treewith no descendant that is also in the filtered tree, paired with their stored weights. Results are sorted by(root, weight)before comparison so order differences don't matter.find_headis called first to flush pending vote deltas.process_block_and_blobs/process_block_and_columnsmapBlockError::DuplicateFullyImportedto success, since specon_blockis idempotent and the compliance corpus re-feeds blocks repeatedly.Metarelaxed to allow the compliance generator's extra fields (seed,model_params,bls_setting); description is nowOption<String>.Drive-by
cargo fmtreordered twouselines inbeacon_node/http_api/src/beacon/execution_payload_envelope.rs(pre-existing fmt drift on this branch — unblocks the local pre-commit hook and thecheck-codeCI job).Helper script
scripts/compliance-fc-report.sh(mirroring Prysm PR #16724). Resolves the corpus from a tarball / URL / extracted dir / GH artifact, stages thefork_choice_compliance/subtree under the ef_tests crate (the corpus path is hardcoded at compile time viaenv!("CARGO_MANIFEST_DIR")), runs the 12 cargo tests with--include-ignored, and prints a per-suite pass/fail/skip table.To bypass the helper:
Notes
#[ignore]so they don't pollute normal CI runs; opt-in via the helper script (or--include-ignored).glamsterdam-devnet-0since the corpus is fulu/gloas-only and this branch already has those forks plumbed in.Current results on this branch — fulu only
Top failure modes are real consensus-spec deltas, not infrastructure:
proposer_boost_rootmismatches — at end-of-slot tick boundaries the spec expects the boost root cleared (0x0…) but lighthouse retains the previous slot's value.on_tick_per_slotdoes reset oncurrent_slot > previous_slot(fork_choice.rs:1437), so the divergence looks like a (re-)apply ordering issue between block import / tick advancement.viable_for_head_roots_and_weightsweight mismatches (block_cover_test fulu, 12 cases). Roots match; weights diverge — lighthouse reports0where the spec expects ~102_400_000_000. Likely attestation-queue timing: votes for slot N aren't applied to weights until the chain has crossed slot N.Test plan
bash -n scripts/compliance-fc-report.shsyntax OKscripts/compliance-fc-report.sh --helpprints full usagecargo fmt --checkcleancargo check -p ef_tests --tests --features "ef_tests,fake_crypto"clean (withRUSTFLAGS="-D warnings")cargo check -p proto_arraycleancargo clippy -p ef_tests --tests --features "ef_tests,fake_crypto"— no new warnings on touched filescargo test -p ef_tests --test tests fork_choice_compliance_(no--include-ignored) → all 12 marked asignored, no failuresscripts/compliance-fc-report.sh --dir <extracted>runs all 12 fns and prints the report shown above--suite block_cover_testruns only that suite🤖 Generated with Claude Code