Skip to content

feat(evm_interpreter): skip JUMPDEST dispatch when reached via JUMP/JUMPI#649

Closed
0xVolosnikov wants to merge 1 commit into
vv-push-small-specializationsfrom
vv-jump-skip-jumpdest
Closed

feat(evm_interpreter): skip JUMPDEST dispatch when reached via JUMP/JUMPI#649
0xVolosnikov wants to merge 1 commit into
vv-push-small-specializationsfrom
vv-jump-skip-jumpdest

Conversation

@0xVolosnikov
Copy link
Copy Markdown
Contributor

@0xVolosnikov 0xVolosnikov commented May 13, 2026

What ❔

JUMP and JUMPI already validate that the target byte is a JUMPDEST opcode. The standalone JUMPDEST handler that the dispatcher runs on the next iteration is then redundant — its only effect is charging 1 gas + JUMPDEST_NATIVE_COST and returning Ok(()).

This PR inlines that work into JUMP / JUMPI only when building for the RISC-V proving target (cfg!(target_arch = "riscv32")).

  • On host builds (forward / RPC / opcode-stats path), the dispatcher continues to run JUMPDEST normally, so EvmOpcodeStatsTracer, EvmOpcodesLogger, and any other forward-mode tracer that keys on per-opcode events keeps observing real JUMPDEST events with accurate gas/native deltas.
  • On RISC-V builds (proving), JUMP/JUMPI fold the JUMPDEST gas + native charge into their own body and advance IP past the JUMPDEST byte. There is no live tracer on RISC-V that keys on JUMPDEST, so dropping the iteration is safe.

Mechanism

const INLINE_JUMPDEST: bool = cfg!(target_arch = "riscv32");

A compile-time const gates the optimization. The non-inlined branches are dead code on RISC-V (and vice versa), so the hot path stays lean on both targets.

  • JUMP: when INLINE_JUMPDEST, fuses MID + JUMPDEST gas and JUMP_NATIVE + JUMPDEST_NATIVE native into a single spend_gas_and_native call (InvalidJump consumes all remaining gas anyway, so pre-charging is observationally identical). On host, charges only MID / JUMP_NATIVE as before and lands on dest.
  • JUMPI: charges HIGH / JUMPI_NATIVE as before. On INLINE_JUMPDEST-and-taken-and-valid, a second spend_gas_and_native(JUMPDEST, JUMPDEST_NATIVE) charge is added (kept separate so the not-taken path does not pay JUMPDEST gas — the OOG boundary for marginal-gas frames must not shift). IP lands on dest + 1.
  • JUMPDEST handler is preserved and runs as before on host. On RISC-V it only runs for fall-through cases (e.g. JUMPI condition false landing on a JUMPDEST byte).

Cycle-marker balance

cycle_marker keeps a host-side LABELS Vec and pairs it with marker CSR writes from the RISC-V binary; the two counts must match. Because host still dispatches JUMPDEST while RISC-V skips it, JUMP / JUMPI emit a synthetic cycle_marker::opcode_start!() / opcode_end!("JUMPDEST") pair on RISC-V to keep counts balanced.

Per-opcode cycle attribution between JUMP/JUMPI and JUMPDEST in the bench output gets scrambled because the synthetic pair nests inside the dispatcher's JUMP/JUMPI bracket — but the block-level process_block effective-cycle total is unaffected. That's the metric we measure for proving cost.

Why ❔

After the PUSH2..PUSH8 work, JUMPDEST showed as a top-tier total-cycle contributor in benchmark blocks despite costing only ~42 cycles per call, because of its sheer call count. Most of those JUMPDESTs are reached as jump targets where the validity check has already happened — the dispatch iteration is pure overhead (per-opcode STEP_NATIVE charge + dispatch match arm + cycle_marker pair).

Benchmark

bench_scripts/bench.sh compare, baseline = vv-push-small-specializations HEAD:

Block Base Eff Head Eff Δ
block_19299001 208,011,772 207,739,099 −0.13% (−273K cycles)
block_22244135 134,167,657 132,764,589 −1.05% (−1.40M cycles)

Per-opcode (aggregated across both blocks, proving-side cycle_marker data):

Opcode Count Median cycles Δ Median Notes
JUMP 56,689 76 → 84 +10.5% absorbs fused JUMPDEST charge
JUMPI 46,978 88 → 132 +29.4% absorbs separate JUMPDEST charge
JUMPDEST 87,480 ~42 → 2 −95% synthetic-marker only on RISC-V; per-opcode attribution is the intentional artifact

Delegations (Blake / BigInt / Keccak): unchanged.

Cross-opcode noise

DUP1–DUP10 show ~+1.8% each (~150-200K cycles aggregate) consistent with the I-cache pressure already observed on this dispatch loop. Net per-block effective cycles are still solidly negative.

Is this a breaking change?

  • Yes
  • No

No protocol-visible behavior change. Total gas charged per JUMP+JUMPDEST sequence is 9 gas on both host and RISC-V builds. EVM state transitions are identical between modes. Failure semantics preserved (InvalidJump → consume all remaining gas, identical caller-visible outcome regardless of where the charge fails).

Checklist

  • PR title corresponds to the body of PR.
  • Tests for the changes have been added / updated.
    • cargo test -p evm_interpreter --features testing (13/13). tests/instances/evm rig (14/14). Existing tests exhaustively exercise JUMP/JUMPI/JUMPDEST through real EVM bytecode.
  • Documentation comments have been added / updated.
  • Code has been formatted.

Base branch note

Stacked on vv-push-small-specializations (PR #648). When that lands, this PR's base can be retargeted to custom-u256.

🤖 Generated with Claude Code

JUMP and JUMPI already validate that the target byte is a JUMPDEST
opcode via `bytecode_preprocessing.is_valid_jumpdest(dest)`. The
JUMPDEST handler that the dispatcher runs on the next iteration is
then redundant — its only effect is charging 1 gas + JUMPDEST_NATIVE.

Skip that iteration: advance IP past the JUMPDEST byte (`dest + 1`)
and emit a synthetic JUMPDEST event inside the JUMP/JUMPI handler:

- `before_evm_interpreter_execution_step(JUMPDEST, ...)` fires with
  IP at `dest`, matching the state a real dispatch would have shown.
- The JUMPDEST gas + native are charged via a separate
  `spend_gas_and_native` call (kept separate from JUMP/JUMPI's charge
  so the synthetic before/after bracket cleanly contains exactly the
  JUMPDEST cost, and so the JUMPI not-taken path doesn't pay JUMPDEST
  gas).
- `after_evm_interpreter_execution_step(JUMPDEST, ...)` fires with IP
  at `dest + 1`, again matching real-dispatch state.
- The dispatch loop's `cycles` counter is incremented inside the
  inlined JUMPDEST event so the "Instructions executed = N" debug
  log stays consistent with the number of opcodes accounted for.

JUMPDEST handler is preserved and still runs for fall-through cases
(e.g. JUMPI condition false landing on a JUMPDEST byte).

Benchmark (bench_scripts/bench.sh compare against
vv-push-small-specializations baseline):
- block_19299001 process_block: +0.05% effective (within noise)
- block_22244135 process_block: -0.45% effective (-600K cycles)
- JUMP median cycles: 76 -> 100 (absorbs the extra
  spend_gas_and_native + two synthetic hook calls)
- JUMPI median cycles: 88 -> 144 (same + the JUMPI two-pop)

JUMPDEST iterations are skipped for the JUMP/JUMPI-target case
(~96.7% of all JUMPDESTs were jump-targets before this PR, now
those skip dispatch entirely). The synthetic before/after fires for
each, so any tracer keying on JUMPDEST events still sees them — the
count is preserved at 87,480.

Tracer observability:
- The `EvmOpcodeStatsTracer` uses a single `gas_before` field that
  is overwritten by the synthetic `before_JUMPDEST`. After the
  handler returns, the dispatch-level `after_JUMP` / `after_JUMPI`
  pops the overwritten snapshot, which records JUMPDEST's small
  delta against JUMP/JUMPI. This is a known limitation of the
  current single-field tracer with nested events; the JUMPDEST
  bracket itself records the correct delta. Tracers that only key
  on events (logger, call_tracer) are unaffected. Tracers that need
  precise per-opcode JUMP/JUMPI gas/native deltas should snapshot
  on a stack in a follow-up.

Some cross-opcode regressions on DUP1-DUP10 (~+1.8% each) consistent
with I-cache pressure from the larger JUMP/JUMPI handlers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant