refactor: defer cache materialization for access-list touches#601
refactor: defer cache materialization for access-list touches#601antoniolocascio wants to merge 11 commits into
Conversation
## What ❔ Reimplement the hints for ecrecover from ethproofs. Instead of having a new full implementation of ecrecover, this PR introduces "hooks" for 3 secp256k1 field operations. These hooks have two implmentations: default, where the implementation is straightforward (same implementation as before this PR) and oracle-based, where we use the new callable oracles. This way, most of the logic for ecrecover is reused. Given that this is a critical part of the system, the PR includes: * PBT for "good" case (showing equivalence when using the right oracle) * Tests for the "bad" case (showing panics if the oracle lies) * Modifies the fuzz target for ecrecover to compare the forward and oracle runs. I've ran the fuzzer for over 1h without finding issues. <!-- What are the changes this PR brings about? --> <!-- Example: This PR adds a PR template to the repo. --> <!-- (For bigger PRs adding more context is appreciated) --> ## Why ❔ <!-- Why are these changes done? What goal do they contribute to? What are the principles behind them? --> <!-- The `Why` has to be clear to non-Matter Labs entities running their own ZK Chain --> <!-- Example: PR templates ensure PR reviewers, observers, and future iterators are in context about the evolution of repos. --> ## Is this a breaking change? - [ ] Yes - [ ] No ## Checklist <!-- Check your PR fulfills the following items. --> <!-- For draft PRs check the boxes as you complete them. --> - [ ] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [ ] Tests for the changes have been added / updated. - [ ] Documentation comments have been added / updated. - [ ] Code has been formatted.
…coding (#572) ## What ❔ Remove the `artifacts_len` field and artifact bytes from the pubdata diff compression encoding: - **Format 0** (publish bytecode): remove `artifacts_len` field (4 bytes), publish only the raw code bytes (`unpadded_code_len`) instead of the full bytecode blob (code + padding + artifacts) - **Format 4** (not_publish_bytecode): remove `artifacts_len` from the length calculation — fixes a pre-existing bug where `diff_compression_length()` counted +4 bytes for `artifacts_len` but `diff_compression()` never actually wrote it - Bump `PUBDATA_ENCODING_VERSION` from 1 to 2 - Update format documentation comments and unit test ## Why ❔ Bytecode artifacts (jump table, etc.) are fully deterministic from the raw bytecode and do not need to be published in pubdata. Removing them reduces pubdata size for contract deployments without losing any information — the receiver can recompute artifacts from the code. ## Is this a breaking change? - [x] Yes - [ ] No This changes the pubdata encoding format (version 1 → 2). The settlement layer / receiver must be updated to parse the new format. ## Checklist - [x] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [x] Tests for the changes have been added / updated. - [x] Documentation comments have been added / updated. - [x] Code has been formatted. --------- Co-authored-by: Claude Code <claude-code@anthropic.com>
## What ❔ Per-opcode EVM benchmarking infrastructure that collects gas, native resource, and RISC-V cycle stats for every EVM opcode execution. Enables computing cycles/gas and native/gas ratios to guide proving cost optimizations. **Components:** - `EvmOpcodeStatsTracer` — forward-mode tracer collecting per-opcode gas, native, count with min/max/median - Per-opcode cycle markers in the interpreter loop (`opcode_start!/opcode_end!`) — aggregated in `print_cycle_markers()` with min/max/median - Per-execution sample dump mode — write `(gas, native)` and `cycles` per execution for detailed ratio analysis - `join_samples.py` — joins tracer + cycle samples, computes per-execution cycles/gas with p50/p95/p99/max - `compare_opcode_stats.py` — CI integration: compact diff table when per-opcode stats change - `visualize_opcode_stats.py` — total cycle bar chart + sorted cycles/gas ratio curves - Integration into `eth_runner` single-run and CI bench workflow **Usage:** ```bash # Basic run (prints per-opcode stats table) bash bench_scripts/bench.sh quick # Full per-execution analysis OPCODE_SAMPLES_DIR=samples OPCODE_CYCLE_SAMPLES_DIR=cycle_samples bash bench_scripts/bench.sh quick python3 bench_scripts/join_samples.py samples/ cycle_samples/ --summary --out-dir joined/ python3 bench_scripts/visualize_opcode_stats.py joined/ --out-dir charts/ ``` **Benchmark impact:** +0.34% effective cycles with benchmarking features enabled. Zero overhead in production builds (all behind `#[cfg(feature = "cycle_marker")]`). ## Why ❔ To compute per-opcode `cycles/gas` and `native/gas` ratios for verifying the native resource model against actual RISC-V proving costs. ## Is this a breaking change? - [ ] Yes - [x] No All code is behind feature flags or in benchmarking-only paths. Production proving builds are unaffected — verified by audit of all changed files and benchmark comparison. ## Checklist - [x] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [x] Tests for the changes have been added / updated. - [x] Documentation comments have been added / updated. - [x] Code has been formatted. --------- Co-authored-by: Antonio Locascio <antonio.locascio1@gmail.com> Co-authored-by: vv-dev-ai <vv@matterlabs.dev> Co-authored-by: Claude Code <claude-code@anthropic.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
## What Add `compare_opcode_cycles.py` script and wire it into the bench CI workflow so that per-opcode RISC-V cycle diffs are posted on PRs. The bench CI already collected per-opcode cycle stats in `.bench` files on both the base and head sides (via `opcode_start!/opcode_end!` cycle markers), but had no script to compare them. The existing `compare_opcode_stats.py` only compared forward-mode gas/native stats, which don't change when the execution implementation changes without modifying resource accounting (e.g. the custom U256 PR). ## Why Without this, PRs that affect RISC-V execution performance (like the custom U256 migration) show overall effective cycle improvements in the benchmark report but lack visibility into which specific EVM opcodes improved or regressed. This makes it harder to review and reason about performance changes. ## Is this a breaking change? - [ ] Yes - [x] No ## Checklist - [x] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [x] Tests for the changes have been added / updated. - [x] Documentation comments have been added / updated. - [x] Code has been formatted. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
3428e30 to
e0db1f1
Compare
e0db1f1 to
573688a
Compare
## What ❔ This PR changes the way we compute prover input. Before, we needed to run the slow risc-v simulator to do so. Now, we can do on native architecture. This run is also responsible for computing the pubdata. <!-- What are the changes this PR brings about? --> <!-- Example: This PR adds a PR template to the repo. --> <!-- (For bigger PRs adding more context is appreciated) --> ## Why ❔ <!-- Why are these changes done? What goal do they contribute to? What are the principles behind them? --> <!-- The `Why` has to be clear to non-Matter Labs entities running their own ZK Chain --> <!-- Example: PR templates ensure PR reviewers, observers, and future iterators are in context about the evolution of repos. --> ## Is this a breaking change? - [ ] Yes - [ ] No ## Checklist <!-- Check your PR fulfills the following items. --> <!-- For draft PRs check the boxes as you complete them. --> - [ ] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [ ] Tests for the changes have been added / updated. - [ ] Documentation comments have been added / updated. - [ ] Code has been formatted.
Code Review SummaryReviewed by: Claude Opus 4.6 (manual deep review of entire 4000-line diff + 6 automated review passes in progress) Critical
Major
Minor1. The All current call sites maintain the invariant correctly (verified: Suggestion: Consider promoting to a runtime 2. The storage diff iterators now use: initial_value: initial_record.value().copied().unwrap_or_default(),
current_value: current_record.value().copied().unwrap_or_default(),If an unobserved entry somehow reaches the diff iterator, this would silently produce Suggestion: Consider using 3. Benchmark regression — Informational The benchmark bot shows: SLOAD +2.8% median cycles, real blocks +0.18–0.38%, Verified as CorrectThe following areas were specifically reviewed and found to be correct:
Automated review by Claude Code |
Consolidated Review — All 6 Review Rounds CompleteReviewed by: Claude Opus 4.6 (3 rounds) + Codex (3 rounds), manually validated
Critical
Major1. 🔺 The Suggestion: Promote to runtime 2. 🔺 Diff iterators use Suggestion: Use Minor3. 🔺 Touch/materialization pattern duplicated across 3 locations The same "charge → get_or_insert placeholder → check warmth → branch on is_value_observed → update vs mutate_current_in_place" logic exists in 4. The 5. 🔺 Rollback test misses warmth-survives-revert assertion —
6. Benchmark regression — Multiple files SLOAD +2.8%, EXTCODESIZE +11.4% (small absolute), real blocks +0.18-0.38%. Caused by 7. Pre-existing issue (just moved into 8. Multiple Nits9. After 10. TestOracle duplication — 6 copies across test modules Near-identical oracle boilerplate in 6 test files. Consider extracting to a shared test utility. False Positives Eliminated
Verified Correct (Consensus Across All Reviewers)All 6 reviewers independently verified:
Automated review by Claude Code (6 parallel review passes) |
|
Fixed the major ones, the minors aren't valid/worth fixing |
## What ❔ We had an invalid worst case calculation + small cleanup ## Why ❔ <!-- Why are these changes done? What goal do they contribute to? What are the principles behind them? --> <!-- The `Why` has to be clear to non-Matter Labs entities running their own ZK Chain --> <!-- Example: PR templates ensure PR reviewers, observers, and future iterators are in context about the evolution of repos. --> ## Is this a breaking change? - [ ] Yes - [ ] No ## Checklist <!-- Check your PR fulfills the following items. --> <!-- For draft PRs check the boxes as you complete them. --> - [ ] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [ ] Tests for the changes have been added / updated. - [ ] Documentation comments have been added / updated. - [ ] Code has been formatted. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
a54efb2 to
04642be
Compare
|
I see here that worst case for CALLDATACOPY is "15462.7" - it is strange, compare with #588 (comment) Maybe I missed some fixes for benchmarks, you can ask AI to investiagate |
e40e03d to
b756e16
Compare
Benchmark report
Per-opcode cycle diff
|
a788be8 to
eef48f3
Compare
What ❔
This PR defers cache materialization for EIP-2930 access-list touches in both storage and account caches.
It changes the touch paths to:
It also adds focused tests for:
CacheRecord/HistoryMapinvariants used by the lazy materialization flowWhy ❔
The goal is to make access-list warming actually lazy.
Before this change, access-list touches could force unnecessary materialization and oracle IO even when execution never observed the touched slot/account value. That was inconsistent with geth execution witness, blocking ethproofs integration.
This change keeps the warmness/accounting semantics, avoids unnecessary reads, preserves the persistence invariant that observed state is materialized before it is consumed, and keeps the flat-model native charging path independent of cache status.
Is this a breaking change?
Checklist