fix(modexp): skip leading zero bits in first exponent byte by antoniolocascio-bot · Pull Request #618 · matter-labs/zksync-os

antoniolocascio-bot · 2026-04-13T11:26:17Z

What

Skip leading zero bits in the first nonzero exponent byte before entering per-bit iteration in modpow_montgomery and big_wrapping_pow.

Why

The forward modexp exponentiation paths always iterated all 8 bits of the first nonzero exponent byte (mask = 1 << 7), while gas pricing uses the exponent's adjusted bit length. For exponents like 0x01, pricing charges for 1 bit of work, but execution performed 8 bit-iterations (7 redundant squarings of the identity). This created a bounded metering mismatch.

The fix initializes the first-byte mask at the highest set bit (1 << (7 - leading_zeros)) instead of always at bit 7, so execution work aligns with what's charged.

The mismatch is bounded (at most 7 extra no-op squarings per call) and is an optimization, not a correctness issue.

Is this a breaking change?

Yes
No

Checklist

PR title corresponds to the body of PR (we generate changelog entries from PRs).
Tests for the changes have been added / updated.
Documentation comments have been added / updated.
Code has been formatted.

## What ❔ Reimplement the hints for ecrecover from ethproofs. Instead of having a new full implementation of ecrecover, this PR introduces "hooks" for 3 secp256k1 field operations. These hooks have two implmentations: default, where the implementation is straightforward (same implementation as before this PR) and oracle-based, where we use the new callable oracles. This way, most of the logic for ecrecover is reused. Given that this is a critical part of the system, the PR includes: * PBT for "good" case (showing equivalence when using the right oracle) * Tests for the "bad" case (showing panics if the oracle lies) * Modifies the fuzz target for ecrecover to compare the forward and oracle runs. I've ran the fuzzer for over 1h without finding issues.    ## Why ❔    ## Is this a breaking change? - [ ] Yes - [ ] No ## Checklist   - [ ] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [ ] Tests for the changes have been added / updated. - [ ] Documentation comments have been added / updated. - [ ] Code has been formatted.

…coding (matter-labs#572) ## What ❔ Remove the `artifacts_len` field and artifact bytes from the pubdata diff compression encoding: - **Format 0** (publish bytecode): remove `artifacts_len` field (4 bytes), publish only the raw code bytes (`unpadded_code_len`) instead of the full bytecode blob (code + padding + artifacts) - **Format 4** (not_publish_bytecode): remove `artifacts_len` from the length calculation — fixes a pre-existing bug where `diff_compression_length()` counted +4 bytes for `artifacts_len` but `diff_compression()` never actually wrote it - Bump `PUBDATA_ENCODING_VERSION` from 1 to 2 - Update format documentation comments and unit test ## Why ❔ Bytecode artifacts (jump table, etc.) are fully deterministic from the raw bytecode and do not need to be published in pubdata. Removing them reduces pubdata size for contract deployments without losing any information — the receiver can recompute artifacts from the code. ## Is this a breaking change? - [x] Yes - [ ] No This changes the pubdata encoding format (version 1 → 2). The settlement layer / receiver must be updated to parse the new format. ## Checklist - [x] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [x] Tests for the changes have been added / updated. - [x] Documentation comments have been added / updated. - [x] Code has been formatted. --------- Co-authored-by: Claude Code <claude-code@anthropic.com>

## What ❔ Per-opcode EVM benchmarking infrastructure that collects gas, native resource, and RISC-V cycle stats for every EVM opcode execution. Enables computing cycles/gas and native/gas ratios to guide proving cost optimizations. **Components:** - `EvmOpcodeStatsTracer` — forward-mode tracer collecting per-opcode gas, native, count with min/max/median - Per-opcode cycle markers in the interpreter loop (`opcode_start!/opcode_end!`) — aggregated in `print_cycle_markers()` with min/max/median - Per-execution sample dump mode — write `(gas, native)` and `cycles` per execution for detailed ratio analysis - `join_samples.py` — joins tracer + cycle samples, computes per-execution cycles/gas with p50/p95/p99/max - `compare_opcode_stats.py` — CI integration: compact diff table when per-opcode stats change - `visualize_opcode_stats.py` — total cycle bar chart + sorted cycles/gas ratio curves - Integration into `eth_runner` single-run and CI bench workflow **Usage:** ```bash # Basic run (prints per-opcode stats table) bash bench_scripts/bench.sh quick # Full per-execution analysis OPCODE_SAMPLES_DIR=samples OPCODE_CYCLE_SAMPLES_DIR=cycle_samples bash bench_scripts/bench.sh quick python3 bench_scripts/join_samples.py samples/ cycle_samples/ --summary --out-dir joined/ python3 bench_scripts/visualize_opcode_stats.py joined/ --out-dir charts/ ``` **Benchmark impact:** +0.34% effective cycles with benchmarking features enabled. Zero overhead in production builds (all behind `#[cfg(feature = "cycle_marker")]`). ## Why ❔ To compute per-opcode `cycles/gas` and `native/gas` ratios for verifying the native resource model against actual RISC-V proving costs. ## Is this a breaking change? - [ ] Yes - [x] No All code is behind feature flags or in benchmarking-only paths. Production proving builds are unaffected — verified by audit of all changed files and benchmark comparison. ## Checklist - [x] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [x] Tests for the changes have been added / updated. - [x] Documentation comments have been added / updated. - [x] Code has been formatted. --------- Co-authored-by: Antonio Locascio <antonio.locascio1@gmail.com> Co-authored-by: vv-dev-ai <vv@matterlabs.dev> Co-authored-by: Claude Code <claude-code@anthropic.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…er-labs#596) ## What Add `compare_opcode_cycles.py` script and wire it into the bench CI workflow so that per-opcode RISC-V cycle diffs are posted on PRs. The bench CI already collected per-opcode cycle stats in `.bench` files on both the base and head sides (via `opcode_start!/opcode_end!` cycle markers), but had no script to compare them. The existing `compare_opcode_stats.py` only compared forward-mode gas/native stats, which don't change when the execution implementation changes without modifying resource accounting (e.g. the custom U256 PR). ## Why Without this, PRs that affect RISC-V execution performance (like the custom U256 migration) show overall effective cycle improvements in the benchmark report but lack visibility into which specific EVM opcodes improved or regressed. This makes it harder to review and reason about performance changes. ## Is this a breaking change? - [ ] Yes - [x] No ## Checklist - [x] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [x] Tests for the changes have been added / updated. - [x] Documentation comments have been added / updated. - [x] Code has been formatted. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

## What ❔ This PR changes the way we compute prover input. Before, we needed to run the slow risc-v simulator to do so. Now, we can do on native architecture. This run is also responsible for computing the pubdata.    ## Why ❔    ## Is this a breaking change? - [ ] Yes - [ ] No ## Checklist   - [ ] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [ ] Tests for the changes have been added / updated. - [ ] Documentation comments have been added / updated. - [ ] Code has been formatted.

## What ❔ We had an invalid worst case calculation + small cleanup ## Why ❔    ## Is this a breaking change? - [ ] Yes - [ ] No ## Checklist   - [ ] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [ ] Tests for the changes have been added / updated. - [ ] Documentation comments have been added / updated. - [ ] Code has been formatted. --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(ci): parallelize bench workflow

## What ❔ Replace the removed `risc_v_simulator` crate with `riscv_transpiler` from zksync-airbender dev branch (v2). This migrates the entire RISC-V execution and oracle infrastructure to the new VM API. ### Key changes **Dependencies & toolchain:** - `risc_v_simulator` removed from workspace, replaced by `riscv_transpiler` + `common_constants` (git deps pointing to airbender `dev` branch) - Toolchain bumped to `nightly-2026-02-10` in both root and `zksync_os/` (required by airbender v2's `blake2s_u32`) - `zksync_os/reproduce/Dockerfile` updated to match **oracle_provider:** - `ZkEENonDeterminismSource`, `OracleQueryProcessor`, and `ReadWitnessSource` are **no longer generic** over memory type - Processors take `&dyn RamPeek` instead of `&M` where `M: MemorySource` - `DummyMemorySource` now implements `RamPeek` - Implements new `NonDeterminismCSRSource` trait from `riscv_transpiler::vm` (with `write_with_memory_access`, `write_with_memory_access_dyn`) **callable_oracles:** - All query processors (`ArithmeticQuery`, `BlobCommitmentAndProofQuery`, `FieldOpsQuery`, `HashToPrimeSource`, and their `Native*` variants) simplified to non-generic structs - Memory reads use `peek_word(addr)` instead of `memory.get(addr, AccessType, &mut trap)` - Fixed pre-existing overflow check in `read_memory_as_u64` (now uses byte length for bounds check) **zksync_os_runner:** - Rewritten to use `VM::run_basic_unrolled` with `SimpleTape` + `RamWithRomRegion` - Requires `.bin` + `.text` file pair (already produced by `dump_bin.sh` via `objcopy --only-section=.text`) - `DiagnosticsConfig` parameter removed; flamegraph support via feature-gated `run_basic_unrolled_with_flamegraph` - `simulate_witness_tracing` rewritten using `SimpleSnapshotter` **cycle_marker:** - Fully wired to new `CycleMarkerHooks` API from [zksync-airbender#237](matter-labs/zksync-airbender#237) - `print_cycle_markers` now takes a `CycleMarker` argument (scoped via `CycleMarkerHooks::with()`) - Runner uses `VM::<DelegationsCounters, CycleMarkerHooks>` when feature is enabled - Re-exports `CycleMarkerHooks`, `CycleMarker`, `Mark` from `riscv_transpiler::cycle` - Inline asm now uses hardcoded `x0` to match transpiler's strict `csrrw x0, 0x7ff, x0` decoder **forward_system:** - All 12 query processors and system type aliases updated to non-generic oracle types **tests/rig:** - `flamegraph_output` wired through to `run_default_with_flamegraph_path` (replaces old `ProfilerConfig`) - Oracle factory trait methods return non-generic `ZkEENonDeterminismSource` - `run_prover` rewritten to use `execution_utils::unrolled::prove_unrolled_for_machine_configuration_into_program_proof` **Other:** - Removed stabilized `ptr_as_ref_unchecked` feature gate from `evm_interpreter` - Updated `Cargo.lock` for latest `zksync_os_interface` (adds `pubdata_used` to `BlockOutput`) - Rebuilt blake2s test binaries and committed `.text` files - Fixed pre-existing clippy lints (`sort_by_key`, `default_constructed_unit_structs`, `manual_checked_ops`, conditional `P256VerifyErrors` import) ## Why ❔ The `risc_v_simulator` crate was removed from zksync-airbender in [zksync-airbender#233](matter-labs/zksync-airbender#233) and [zksync-airbender#235](matter-labs/zksync-airbender#235). The new `riscv_transpiler` provides a more efficient VM with batch execution, optional JIT compilation (x86_64), and integrated witness snapshotting. This migration is required to stay on the airbender dev branch. ## Is this a breaking change? - [x] Yes - [ ] No **Breaking API changes:** - `zksync_os_runner::run()` and `run_and_get_effective_cycles()` no longer take a `DiagnosticsConfig` parameter - `run_and_get_effective_cycles_from_bytes()` now requires both `img_bytes` and `text_bytes` - `ZkEENonDeterminismSource`, `ReadWitnessSource`, `OracleQueryProcessor` are no longer generic over memory type - `cycle_marker::print_cycle_markers()` now takes a `CycleMarker` argument instead of reading from a global - `MemorySource` trait replaced by `RamPeek` (re-exported from `riscv_transpiler::vm`) ## Checklist - [x] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [x] Tests for the changes have been added / updated. - [x] Documentation comments have been added / updated. - [x] Code has been formatted. --------- Co-authored-by: antoniolocascio <antonio.locascio1@gmail.com>

## What ❔ Move RISC-V binary artifacts from flat files in `zksync_os/` (e.g. `for_tests.bin`) to a structured `dist/<app>/app.{bin,elf,text}` layout (e.g. `dist/for_tests/app.bin`). ## Why ❔ This prepares for the airbender-platform migration where `cargo airbender build` outputs to this layout natively. Splitting this out as a separate PR keeps the platform migration PR focused on the actual API changes. ## Changes - `dump_bin.sh`: output to `dist/<app>/` instead of flat files - `chain.rs`: add `get_zksync_os_dist_dir()`, update path resolution - `binary_checker`: update paths to `dist/<app>/app.text` - `reproduce.sh`: copy from `dist/<app>/app.bin` - `.gitignore`: add `/dist` ## Is this a breaking change? - [ ] Yes - [x] No The build script output changes but all consumers are updated in this PR. ## Checklist - [x] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [x] Tests for the changes have been added / updated. - [x] Documentation comments have been added / updated. - [x] Code has been formatted. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: vibelyova <265594496+vibelyova@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

## What ❔ - Removed unused storage-type markers/bitflags. - Split `storage_types` into focused modules. - Moved `MAX_EVENT_TOPICS` from storage types into system constants and updated imports. ## Why ❔ - Reduces dead code and improves module structure. - Places constants with the correct domain ownership for easier maintenance. ## Is this a breaking change? - [ ] Yes - [x] No ## Checklist - [x] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [ ] Tests for the changes have been added / updated. - [ ] Documentation comments have been added / updated. - [x] Code has been formatted.

## What ❔ Add tests that verify the system correctly validates and rejects invalid/malicious oracle responses. Covers two oracle validation areas: **Block metadata gas limit validation (4 tests):** - `test_block_rejects_excessive_gas_limit` — gas limit just above `MAX_BLOCK_GAS_LIMIT` is rejected - `test_block_accepts_max_gas_limit` — exact boundary value is accepted - `test_block_rejects_u64_max_gas_limit` — extreme overflow value is rejected - `test_empty_block_rejects_excessive_gas_limit` — validation fires before any transactions are processed **DA commitment scheme validation (2 tests):** - `test_da_commitment_scheme_accepts_all_valid_ids` — all valid IDs 0–4 are accepted - `test_da_commitment_scheme_rejects_invalid_ids` — IDs 5, 128, 255 are rejected ## Why ❔ Oracle responses come from an untrusted host and must be validated. These tests confirm that existing validation logic correctly rejects malicious metadata. Gas limit validation prevents resource accounting overflow (ergs = gas × 256). DA commitment scheme validation prevents use of undefined commitment modes. ## Is this a breaking change? - [ ] Yes - [x] No ## Checklist - [x] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [x] Tests for the changes have been added / updated. - [x] Documentation comments have been added / updated. - [x] Code has been formatted. --------- Co-authored-by: Claude Code <claude-code@anthropic.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The forward modexp exponentiation paths (modpow_montgomery and big_wrapping_pow) always iterated all 8 bits of the first nonzero exponent byte, while pricing uses the exponent's adjusted bit length. This caused a bounded metering mismatch where sparse exponents (e.g. 0x01) were charged for fewer bits than actually iterated. Initialize the first-byte mask at the highest set bit instead of always starting at bit 7, aligning execution with pricing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

antoniolocascio and others added 13 commits March 25, 2026 14:25

feat(ci): parallelize bench workflow

c665e0c

Merge pull request matter-labs#608 from matter-labs/lyova/parallel-bench

e3146d4

feat(ci): parallelize bench workflow

AntonD3 requested a review from shamatar April 21, 2026 14:00

antoniolocascio force-pushed the draft-0.4.0 branch from a788be8 to eef48f3 Compare April 23, 2026 08:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(modexp): skip leading zero bits in first exponent byte#618

fix(modexp): skip leading zero bits in first exponent byte#618
antoniolocascio-bot wants to merge 13 commits into
matter-labs:draft-0.4.0from
antoniolocascio-bot:fix/modexp-leading-zero-bits

antoniolocascio-bot commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

antoniolocascio-bot commented Apr 13, 2026

What

Why

Is this a breaking change?

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants