feat(evm_interpreter): skip JUMPDEST dispatch when reached via JUMP/JUMPI#649
Closed
0xVolosnikov wants to merge 1 commit into
Closed
feat(evm_interpreter): skip JUMPDEST dispatch when reached via JUMP/JUMPI#6490xVolosnikov wants to merge 1 commit into
0xVolosnikov wants to merge 1 commit into
Conversation
JUMP and JUMPI already validate that the target byte is a JUMPDEST opcode via `bytecode_preprocessing.is_valid_jumpdest(dest)`. The JUMPDEST handler that the dispatcher runs on the next iteration is then redundant — its only effect is charging 1 gas + JUMPDEST_NATIVE. Skip that iteration: advance IP past the JUMPDEST byte (`dest + 1`) and emit a synthetic JUMPDEST event inside the JUMP/JUMPI handler: - `before_evm_interpreter_execution_step(JUMPDEST, ...)` fires with IP at `dest`, matching the state a real dispatch would have shown. - The JUMPDEST gas + native are charged via a separate `spend_gas_and_native` call (kept separate from JUMP/JUMPI's charge so the synthetic before/after bracket cleanly contains exactly the JUMPDEST cost, and so the JUMPI not-taken path doesn't pay JUMPDEST gas). - `after_evm_interpreter_execution_step(JUMPDEST, ...)` fires with IP at `dest + 1`, again matching real-dispatch state. - The dispatch loop's `cycles` counter is incremented inside the inlined JUMPDEST event so the "Instructions executed = N" debug log stays consistent with the number of opcodes accounted for. JUMPDEST handler is preserved and still runs for fall-through cases (e.g. JUMPI condition false landing on a JUMPDEST byte). Benchmark (bench_scripts/bench.sh compare against vv-push-small-specializations baseline): - block_19299001 process_block: +0.05% effective (within noise) - block_22244135 process_block: -0.45% effective (-600K cycles) - JUMP median cycles: 76 -> 100 (absorbs the extra spend_gas_and_native + two synthetic hook calls) - JUMPI median cycles: 88 -> 144 (same + the JUMPI two-pop) JUMPDEST iterations are skipped for the JUMP/JUMPI-target case (~96.7% of all JUMPDESTs were jump-targets before this PR, now those skip dispatch entirely). The synthetic before/after fires for each, so any tracer keying on JUMPDEST events still sees them — the count is preserved at 87,480. Tracer observability: - The `EvmOpcodeStatsTracer` uses a single `gas_before` field that is overwritten by the synthetic `before_JUMPDEST`. After the handler returns, the dispatch-level `after_JUMP` / `after_JUMPI` pops the overwritten snapshot, which records JUMPDEST's small delta against JUMP/JUMPI. This is a known limitation of the current single-field tracer with nested events; the JUMPDEST bracket itself records the correct delta. Tracers that only key on events (logger, call_tracer) are unaffected. Tracers that need precise per-opcode JUMP/JUMPI gas/native deltas should snapshot on a stack in a follow-up. Some cross-opcode regressions on DUP1-DUP10 (~+1.8% each) consistent with I-cache pressure from the larger JUMP/JUMPI handlers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5f66efd to
6883dcc
Compare
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What ❔
JUMP and JUMPI already validate that the target byte is a
JUMPDESTopcode. The standalone JUMPDEST handler that the dispatcher runs on the next iteration is then redundant — its only effect is charging 1 gas +JUMPDEST_NATIVE_COSTand returningOk(()).This PR inlines that work into JUMP / JUMPI only when building for the RISC-V proving target (
cfg!(target_arch = "riscv32")).EvmOpcodeStatsTracer,EvmOpcodesLogger, and any other forward-mode tracer that keys on per-opcode events keeps observing real JUMPDEST events with accurate gas/native deltas.Mechanism
A compile-time const gates the optimization. The non-inlined branches are dead code on RISC-V (and vice versa), so the hot path stays lean on both targets.
INLINE_JUMPDEST, fusesMID + JUMPDESTgas andJUMP_NATIVE + JUMPDEST_NATIVEnative into a singlespend_gas_and_nativecall (InvalidJumpconsumes all remaining gas anyway, so pre-charging is observationally identical). On host, charges onlyMID / JUMP_NATIVEas before and lands ondest.HIGH / JUMPI_NATIVEas before. OnINLINE_JUMPDEST-and-taken-and-valid, a secondspend_gas_and_native(JUMPDEST, JUMPDEST_NATIVE)charge is added (kept separate so the not-taken path does not pay JUMPDEST gas — the OOG boundary for marginal-gas frames must not shift). IP lands ondest + 1.Cycle-marker balance
cycle_markerkeeps a host-sideLABELSVec and pairs it with marker CSR writes from the RISC-V binary; the two counts must match. Because host still dispatches JUMPDEST while RISC-V skips it, JUMP / JUMPI emit a syntheticcycle_marker::opcode_start!()/opcode_end!("JUMPDEST")pair on RISC-V to keep counts balanced.Per-opcode cycle attribution between JUMP/JUMPI and JUMPDEST in the bench output gets scrambled because the synthetic pair nests inside the dispatcher's JUMP/JUMPI bracket — but the block-level
process_blockeffective-cycle total is unaffected. That's the metric we measure for proving cost.Why ❔
After the PUSH2..PUSH8 work, JUMPDEST showed as a top-tier total-cycle contributor in benchmark blocks despite costing only ~42 cycles per call, because of its sheer call count. Most of those JUMPDESTs are reached as jump targets where the validity check has already happened — the dispatch iteration is pure overhead (per-opcode
STEP_NATIVEcharge + dispatch match arm + cycle_marker pair).Benchmark
bench_scripts/bench.sh compare, baseline =vv-push-small-specializationsHEAD:block_19299001block_22244135Per-opcode (aggregated across both blocks, proving-side cycle_marker data):
Delegations (Blake / BigInt / Keccak): unchanged.
Cross-opcode noise
DUP1–DUP10 show ~+1.8% each (~150-200K cycles aggregate) consistent with the I-cache pressure already observed on this dispatch loop. Net per-block effective cycles are still solidly negative.
Is this a breaking change?
No protocol-visible behavior change. Total gas charged per JUMP+JUMPDEST sequence is 9 gas on both host and RISC-V builds. EVM state transitions are identical between modes. Failure semantics preserved (InvalidJump → consume all remaining gas, identical caller-visible outcome regardless of where the charge fails).
Checklist
cargo test -p evm_interpreter --features testing(13/13).tests/instances/evmrig (14/14). Existing tests exhaustively exercise JUMP/JUMPI/JUMPDEST through real EVM bytecode.Base branch note
Stacked on
vv-push-small-specializations(PR #648). When that lands, this PR's base can be retargeted tocustom-u256.🤖 Generated with Claude Code