feat(bls12_381): skip infinity pairs in pairing check#657
Draft
0xVolosnikov wants to merge 9 commits into
Draft
Conversation
…markers Add two cycle-marker labels that fire only on EE-driven invocations: - `keccak_execution_environment` around the EVM SHA3 opcode dispatch (including the `len == 0` empty-slice shortcut), so per-execution cycles can be joined 1:1 with the opcode-level SHA3 sample stream. The inner `"keccak"` marker (from `Keccak256Impl::execute`) still fires, so bootloader/intrinsic keccak invocations remain attributable to the existing `"keccak"` label only. - `ecrecover_execution_environment` around the EE-precompile ecrecover dispatch via a new `EcRecoverEEInvocation` wrapper. Intrinsic signature-recovery calls from the bootloader do not go through this path, so the marker fires only for EE-triggered calls — no positional intrinsic-filter heuristic needed in joiner scripts. Both markers are pure observability: the underlying system-function calls are unchanged. Required as STF-side instrumentation for the follow-up benchmarking-pipeline PR (CI workflow, joiner scripts, and docs land separately). Also extract `install_precompile_hook` from `add_precompile_ext` so the `PRECOMPILE_ADDRESSES_LOWS` sanity check stays centralized for any future custom invocation type. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…→ state_commitment_update Wrap the per-block `write_pubdata` call inside the three proving post-tx-op handlers (single-block, multi-block, sequencing) in a new `da_commitment` cycle marker. For keccak DA this is where the bulk of keccak delegations fire (bytes are absorbed into the `Keccak256CommitmentGenerator` state); for blob DA the call just appends to a buffer and the KZG work shows up under the pre-existing `blob_versioned_hash` marker — both cases are now observable as a single labelled phase. Rename `verify_and_apply_batch` → `state_commitment_update` to reflect what the marker actually wraps (`IOTeardown::update_commitment` — the state-tree merkle commit, which is Blake-heavy and distinct from the DA commit and the per-blob KZG commit). Applies to all four post-tx-op variants so the label string is consistent across them. No state-transition behavior change — pure instrumentation rename. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- `post_tx_op_sequencing.rs`: complete the rename
`verify_and_apply_batch` → `state_commitment_update` for the
Ethereum-sequencing handler (the 5th post-tx-op site, missed in the
prior sweep) and clean up the leftover `// // 3.` double-slash
comment artifact, restoring cross-STF parity for the
`state_commitment_update` label.
- `evm_interpreter/src/instructions/system.rs::sha3`: move
`cycle_marker::wrap!("keccak_execution_environment", ...)` to envelop
the whole opcode dispatch — including the `pop_2()? / cast_to_usize?
/ spend_gas_and_native?` prelude — so the marker fires on every
dispatch (incl. stack-underflow / invalid-operand / OOG-on-base-cost
short-circuits), matching `EvmOpcodeStatsTracer`'s per-dispatch
sample count. This was the invariant the inline comment claimed but
the code didn't actually uphold; positional pairing in
`cycles_per_native_report.py` / `join_precompile_samples.py` now
truly matches 1:1.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ghting Adds per-execution cycle-sample dumps for both opcode-level and non-opcode (label) markers, plus an effective-cycles formula that weights raw RISC-V cycles by delegation counts so dump consumers see the true prover cost rather than raw cycles alone. - New `OPCODE_CYCLE_SAMPLES_DIR` env var: writes one `<OPCODE>.cycles` file per opcode with raw cycles per execution (append-mode). - New `LABEL_CYCLE_SAMPLES_DIR` env var: writes both `<label>.cycles` (raw) and `<label>.effective.cycles` (raw + delegation weights) per non-opcode marker. Effective is required to attribute prover cost to delegation-heavy phases (precompiles, keccak SF call). - Shared `effective_of` closure: `raw + 16×Blake + 4×BigInt + 4×Keccak` delegation counts, matching the existing `process_block` headline. - Aggregate `.bench` table semantics (total/min/max/median over raw) unchanged; effective is only consumed via the per-execution dump. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-precompile gas/native observability for the forward (sequencer) runtime, mirroring what `EvmOpcodeStatsTracer` does at opcode level. - `PrecompileStatsTracer`: tracks count + min/median/avg/max gas and native for each precompile invocation routed through the EE precompile dispatch path. Skips the bootloader's intrinsic ecrecover call (signature verification) since it doesn't go through the same hook surface. `dump_samples` writes `<precompile>.samples` files (`gas,native` per line) parallel to `EvmOpcodeStatsTracer` output. - `Pair<A, B>` tracer combinator: forwards each tracer hook to both inner tracers so a single run can collect both opcode and precompile stats. Lets `eth_runner` (and bench callers) compose tracers without duplicating run logic. Both tracers are pure observability — no impact on state-transition or proving paths. Consumed by the benchmark pipeline that follows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ompile sweeps
Test-side scaffolding so the new tracers and DA schemes can be exercised:
- `tests/rig/src/chain.rs`: snapshot/revert LABELS around the sequencing
forward run (was around the prover-input run). Under
`BlobsZKsyncOS` DA the `blob_versioned_hash` marker fires only in
proving, so without flipping the snapshot the post-RISC-V count
match trips. Implemented as a Drop guard so an early-return from
`run_forward_no_panic` still reverts and the next run on this thread
starts with clean LABELS.
- `tests/instances/eth_runner/src/single_run.rs`: compose
`EvmOpcodeStatsTracer` + `PrecompileStatsTracer` via `Pair` when both
are requested via env vars; thread `PRECOMPILE_STATS_PATH` and
`PRECOMPILE_SAMPLES_DIR` through the dump path; honor
`BENCH_DA_SCHEME` to pick `BlobsAndPubdataKeccak256` (default) or
`BlobsZKsyncOS` for dual-DA-scheme bench runs.
- `tests/instances/eth_runner/Cargo.toml`: `bench-fast` profile mirror
(excluded from root workspace so the root `[profile.bench-fast]`
doesn't cover it).
- `tests/instances/precompiles/{Cargo.toml,src/lib.rs}`: install
`PrecompileStatsTracer` inside `run_precompile_inner` so the
precompile test crate emits per-call gas/native stats during the
bench pipeline. BLS12-381 / KZG sweeps added under `pectra`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
End-to-end Python tooling consumed by the bench CI to produce the PR comment from the artifacts emitted by `cycle_marker`, the new forward-system tracers, and the eth_runner / precompile test binaries. - `benchlib.py`: shared helpers (`median_int`, `pct`, `fmt_pct`, delegation IDs/coefficients, sample loaders, label-file listing). Adopted by all comparison + join scripts to keep the math consistent across the pipeline. - `compare_bench.py`: base/head `.bench` diff, alias `verify_and_apply_batch` to `state_commitment_update` and strip `_execution_environment` suffix so the diff joins cleanly across the boundary commit; `--no-title` and `--sort-by-symbol` flags for the PR-comment "Block sub-phases" table. - `compare_opcode_stats.py` / `compare_opcode_cycles.py` / `compare_precompile_stats.py`: per-opcode and per-precompile diff tables. - `join_samples.py` / `join_precompile_samples.py`: per-execution joins of gas, native, and cycles. `join_precompile_samples.py` prefers `.effective.cycles` and falls back to raw; handles synthetic precompile labels (keccak from SHA3 opcode samples); strips the first ecrecover marker per `process_transaction` boundary (intrinsic sig-verify) when consuming `--bench-file`. - `cycles_per_native_report.py`: local-only ad-hoc tool that pools one or more `(samples_dir, cycles_dir)` pairs into a Markdown report of per-execution `cycles / native` ratios per opcode and per precompile (median / p95 / max). - `bench.sh`: convenience wrapper with `baseline`, `quick`, `run`, `compare`, `flamegraph` subcommands; threads `PRECOMPILE_STATS_PATH` and the new sample-dir env vars through. - `docs/benchmarking.md`: rewrite as an agent-targeted reference describing the effective-cycles formula, env-var-gated dump pipeline, comparison script index, and ecrecover intrinsic filter caveat. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI plumbing that runs the bench pipeline on each PR and composes the PR comment from the artifacts produced by the bench scripts. - `.github/workflows/bench.yml`: full bench job. Builds base + head with `for-tests-benchmarking-pectra` proving binary; runs eth_runner under both keccak and blob DA schemes; runs `tests/instances/precompiles` with `--test-threads=1` (avoids `MARKER_PATH` truncate-race when multiple precompile tests write the same path concurrently); composes the PR comment as headline + collapsible sections (Block sub-phases via `compare_bench.py --sort-by-symbol`, Precompiles test-crate, per-opcode / per-precompile gas/native, per-execution cycles/native ratios). Has merge-base fallbacks so it gracefully degrades when the base commit predates new dump_bin.sh types / Cargo profiles / env vars / script flags. - `Cargo.toml` (workspace root): `[profile.bench-fast]` (inherits release, `lto = false`, `codegen-units = 16`) for the in-workspace `-p precompiles` build path; covers everything except `eth_runner` which has its own mirror profile. - `zksync_os/Cargo.toml`: add `pectra` feature gating to `proof_running_system/pectra` so the new precompiles can be enabled for benchmarking-binary builds without forcing them into the default feature set. - `zksync_os/dump_bin.sh`: new `--type for-tests-benchmarking-pectra` builds the proving binary with `pectra` feature for KZG / BLS12-381 precompile coverage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
e(O, Q) = e(P, O) = 1 in the target field, so degenerate pairs do not affect the multi-pairing product. Filtering them out after subgroup validation saves the per-pair Miller-loop precomputation that dominates the cost on the Pectra degenerate inputs. Empty filtered input is short-circuited to true to avoid relying on the upstream multi_pairing contract for the empty case. Unit tests cover: single infinity on either side, all-infinity batches, non-trivial pair masked with infinity (still false), balanced pair with interleaved infinity padding (still true), and a malformed encoding that must keep being rejected (filter happens after subgroup check, not before). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
Block-level effective cycles
Block-level sub-phases
Precompiles test-crate bench (synthetic workload, all labels)
Per-opcodePer-precompilePer-precompile per-execution ratios (head) |
6 tasks
f58aaf1 to
8b5affe
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What ❔
Skips degenerate pairs (
G1 == OorG2 == O) in the BLS12-381 pairing-check precompile after subgroup validation. Filtering happens in the parse loop, so malformed encodings still hit the existing subgroup-check rejection path. When every pair turns out to be degenerate, the result is short-circuited totrue(empty product = identity) instead of relying on the upstreammulti_pairingcontract for empty input.Gas / native accounting is unchanged (still per-pair as in EIP-2537). State-transition output is bit-identical to the pre-change path on every input.
Unit tests added in
basic_system/src/system_functions/bls12_381/pairing.rs:(O, Q)/(P, O)/(O, O)→truetruee(G1, G2) ≠ 1, and prefixing / suffixing degenerate pairs must keep itfalsee(G1, G2) · e(-G1, G2) = 1with interleaved degenerate pairs → stilltrue(x≠0, y=0)G1 encoding stays rejected (filter cannot bypass subgroup check)Why ❔
Picked up from review feedback on
vv/bench-precompiles. The Pectra precompile bench samples (seebench_report.mdon the base branch) show that the pairing precompile is disproportionately expensive when called with degenerate inputs, because every encoded pair still flows through the Miller-loop precomputation. Skipping degenerate pairs at the affine level avoids theG1Affine → G1Prepared/G2Affine → G2Preparedconversion for those pairs, which is the dominant cost.Mathematically safe: in the BLS12-381 optimal-Ate pairing,
e(O, Q) = e(P, O) = 1_F_T, so dropping any pair whose G1 or G2 is the point at infinity does not change the multi-pairing product.Is this a breaking change?
Checklist
Stacked on
This PR is based on
vv/bench-precompiles. Land that first.Follow-ups not in this PR
bench_scripts/compare_precompile_stats.pyagainst the base branch.basic_system/src/system_functions/bn254_pairing_check.rs(also unconditionally feeds every pair tomulti_pairing); kept as a separate PR since this branch is BLS-only.🤖 Generated with Claude Code