perf(history_map): box elements, pointer-only pending list#668
Draft
0xVolosnikov wants to merge 2 commits into
Draft
perf(history_map): box elements, pointer-only pending list#6680xVolosnikov wants to merge 2 commits into
0xVolosnikov wants to merge 2 commits into
Conversation
e83c601 to
a478b5b
Compare
This was referenced May 19, 2026
0xVolosnikov
added a commit
that referenced
this pull request
May 20, 2026
## What ❔ Replaces `Bytes32`'s `Ord` impl on **the RISC-V proving target only** with a word-by-word equality scan over the underlying `inner: [usize; BYTES32_USIZE_SIZE]` (N=4 on 64-bit, 8 on RISC-V32): - Equal-prefix iterations stay in the fast path: load word, compare word, branch. - On the first differing word, resolve byte-lex order by walking the bytes of just that word — cheaper than `swap_bytes()` on RV32 without the Zbb extension (which this target does not enable). - `cmp` / `partial_cmp` are `#[inline]` since they sit on the BTreeMap descent hot path. On non-RV32 targets (forward / sequencer host) the impl falls back to the original `as_u8_array_ref().cmp(...)`. libc's `memcmp` is already SIMD-vectorized (SSE2/AVX/NEON), so the byte path beats a pure-Rust word loop on the host. The chunked helper (`cmp_word_chunked`) is still compiled on every target so the equivalence tests can validate it on the host. Affected: - `zk_ee/src/utils/bytes32.rs` ## Why ❔ From the flamegraph analysis on `draft-0.4.0`: - `impls::compare_bytes` is the **second-highest self-cost function in the binary at 6.47 %**. On no-std RISC-V the `<[u8]>::cmp` path falls back to `compiler-builtins`' generic `memcmp` (`compiler-builtins/src/mem/impls.rs:388`) — a plain byte loop with no chunking. - Almost all of that self-cost is reached through `Bytes32::cmp` (directly, or via `WarmStorageKey::cmp` → `Bytes32::cmp`) sitting inside `NodeRef::find_key_index` in HistoryMap / preimage / FlatStorageCommitment lookups. Comparing word-aligned 32-byte values byte-by-byte is wasteful when the struct is already a `[usize; N]` with `align(8)`. Forward mode doesn't need the rewrite because libc memcmp already handles it efficiently. ## Benchmark results Branch vs `origin/draft-0.4.0` (commit `59bdedfa`): | Block | Base eff | Head eff | Δ eff | Δ raw | |---|---|---|---|---| | `19299001` | 208,601,280 | 204,648,699 | **−1.89 %** | **−2.49 %** | | `22244135` | 134,741,585 | 132,504,954 | **−1.66 %** | **−2.10 %** | Blake / Bigint / Keccak delegation counts unchanged in both runs — pure raw-cycle reduction. ### Variant comparison (RV32 only — host path is unchanged) | Variant | block 19299001 Δ eff | block 22244135 Δ eff | |---|---|---| | word-cmp + `to_be()` bswap on mismatch | −1.56 % | −1.39 % | | **word-cmp + `#[inline]` + byte-fallback (this PR)** | **−1.89 %** | **−1.66 %** | | word-cmp + 2-word XOR-OR fusion | −1.68 % | −1.45 % | The 2-word XOR-OR variant regresses vs the simple loop — the extra arithmetic per executed iteration doesn't pay for the halved branch count on RV32. Picked the byte-fallback variant. ## Is this a breaking change? - [ ] Yes - [x] No Public API of `Bytes32` is unchanged. `Ord` / `PartialOrd` impls produce identical orderings to the previous implementation on every target (verified by `cmp_tests::cmp_matches_byte_lex_on_pseudorandom_pairs`, which invokes `cmp_word_chunked` directly so the helper is exercised on the host too). ## Stackability with #668 / #669 This PR targets the leaf `compare_bytes` cost; #668 / #669 target the BTreeMap lookups themselves on the pending-list paths. They hit different code regions and are expected to compose (overlap only where BTreeMap node descent invokes `Bytes32::cmp` — and even there the per-compare cost goes down). ## Checklist - [x] PR title corresponds to the body of PR. - [x] Tests for the changes have been added / updated. (Equivalence tests added: `cmp_matches_byte_lex_on_handcrafted_pairs`, `cmp_matches_byte_lex_on_pseudorandom_pairs`. Both invoke `cmp_word_chunked` directly so the chunked path is exercised on the host.) - [x] Documentation comments have been added / updated. - [x] Code has been formatted. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wrap BTreeMap values in Box for stable addresses, embed K inside ElementWithHistory, and store NonNull<ElementWithHistory> in the pending-updates list instead of cloned keys. Bypasses BTreeMap lookups on rollback/commit/iter-pending paths and removes K::clone() per update(). Experimental — for benchmarking only.
`cargo clippy --workspace -- -D warnings` flagged the `btree.iter().map(|(_k, v)| ...)` pattern in HistoryMap::iter as `clippy::iter_kv_map`. Replace with `btree.values().map(|v| ...)`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
a478b5b to
fd60dba
Compare
There was a problem hiding this comment.
Pull request overview
This PR optimizes HistoryMap bookkeeping by ensuring ElementWithHistory has a stable address (boxing values stored in the BTreeMap) and by changing the pending-updates list to store raw pointers (NonNull) instead of cloned keys, avoiding repeated key clones and BTreeMap::get() lookups on rollback/commit/pending-iteration paths.
Changes:
- Store
Box<ElementWithHistory<...>>as theBTreeMapvalue to keep element addresses stable across node splits. - Change
pending_updated_elementsentries from(K, CacheSnapshotId)to(NonNull<ElementWithHistory>, CacheSnapshotId)and update rollback/commit/iter paths to dereference pointers. - Embed a copy of
KinsideElementWithHistoryso iterators/callbacks can still surface keys without map lookups.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
zk_ee/src/common_structs/history_map/mod.rs |
Boxes map values and switches the pending list to pointer-based entries; updates rollback/commit/iteration logic accordingly. |
zk_ee/src/common_structs/history_map/element_with_history.rs |
Adds embedded key: K to ElementWithHistory and updates construction/tests for the new type parameter. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| use alloc::collections::btree_map::Entry; | ||
| use alloc::collections::BTreeMap; | ||
| use core::{alloc::Allocator, fmt::Debug, ops::Bound}; | ||
| use core::{alloc::Allocator, fmt::Debug, ops::Bound, ptr::NonNull}; |
Comment on lines
14
to
22
| /// The history linked list. Always has at least one item with the snapshot id of 0. | ||
| pub struct ElementWithHistory<V, A: Allocator + Clone, EP = ()> { | ||
| /// | ||
| /// `key` is embedded so that the pending-updates list can store a stable pointer | ||
| /// to an `ElementWithHistory` and still surface the key on iteration, avoiding | ||
| /// a `BTreeMap::get(&K)` lookup per pending entry. | ||
| pub struct ElementWithHistory<K, V, A: Allocator + Clone, EP = ()> { | ||
| /// Key owned by this element (separate from the BTreeMap key copy). | ||
| pub key: K, | ||
| /// Additional properties associated with the element globally. |
Contributor
Block-level effective cycles
Block-level sub-phases
Precompiles test-crate bench (synthetic workload, all labels)
Per-opcodePer-opcode cycle diff
Per-precompilePer-precompile per-execution ratios (head) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What ❔
Wraps each
ElementWithHistoryinBoxso its address is stable across BTreeMap node splits, and storesNonNull<ElementWithHistory>in the pending-updates list instead of cloned keys.Kis embedded insideElementWithHistoryso iterators/callbacks that need a key still get one without a BTreeMap lookup.Affected:
zk_ee/src/common_structs/history_map/mod.rszk_ee/src/common_structs/history_map/element_with_history.rsWhy ❔
HistoryMapis on the hot bookkeeping path of storage cache, account cache, transient storage, and preimage publication. Every non-coalescedupdate()clonedK(up to 52 B forWarmStorageKey) into aStackLinkedList<(K, CacheSnapshotId), A>; everyrollback/commit/iter_altered_since_commit/apply_to_last_record_of_pending_changesthen re-resolved each entry viaBTreeMap::get(&K).With this change:
update()no longer clonesK; pending entry shrinks from(K, snap)to(NonNull, snap)= 16 B.get,get_mut,get_or_insert) are unchanged — this targets the bookkeeping paths, not steady-state EVM dispatch.Cost: one
Box::new_inper unique key inserted in the map's lifetime, plus an extraKcopy embedded insideElementWithHistory(in addition to the BTreeMap key copy).Benchmark results
Branch vs
draft-0.4.0(commitd62b8ad2):1929900122244135Raw cycles drop −2.57 % / −2.17 %; Blake/Bigint/Keccak delegation counts unchanged.
compare_opcode_stats.pyreports no per-opcode regressions, consistent with the change living outside the EVM hot loop.Is this a breaking change?
Public API of
HistoryMap,HistoryMapItemRef,HistoryMapItemRefMutis preserved. InternalElementWithHistorygains aKtype parameter, but it isn't constructed outside this module.Notes / follow-ups
BoxuntilHistoryMap::clear(), which also resets the pending list — so pointers stored in pending are valid for their lifetime there. The unsafe blocks are annotated.element_pool.rs(NonNull::from_reffollowed by lateras_mut) is present ondraft-0.4.0too; not addressed here.ElementWithHistoryfrom aListVecpool like the existingElementPool) would amortize allocations and is the natural next step ifBoxper unique key shows up — left unimplemented for now.Checklist
miri_*and rig-style HistoryMap tests cover the new code paths; internalElementWithHistorytests updated for the newKtype parameter.)