perf(l1): add binaryandhash data block index to all rocksdb CFs for faster point lookups by dicethedev · Pull Request #6529 · lambdaclass/ethrex

dicethedev · 2026-04-25T09:58:49Z

Motivation
Point lookups into RocksDB data blocks currently use binary search. For trie nodes and flat key-value stores, which are heavily read during block execution, this adds unnecessary CPU overhead on each block scan.

Description
Enables DataBlockIndexType::BinaryAndHash with a hash ratio of 0.75 on all column families. This adds a hash index inside each SST data block so point lookups hit O(1) hash probe instead of O(log n) binary
search, at the cost of ~25% extra space within each data block.

Applied to all CFs: HEADERS, BODIES, CANONICAL_BLOCK_HASHES, BLOCK_NUMBERS, ACCOUNT_TRIE_NODES, STORAGE_TRIE_NODES, ACCOUNT_FLATKEYVALUE, STORAGE_FLATKEYVALUE, ACCOUNT_CODES, RECEIPTS, and the default arm.

Checklist

Updated STORE_SCHEMA_VERSION (crates/storage/lib.rs) if the PR
includes breaking changes to the Store requiring a re-sync.

Closes #5941

…t lookups

greptile-apps · 2026-04-25T10:02:00Z

Greptile Summary

This PR enables DataBlockIndexType::BinaryAndHash with hash_ratio=0.75 (the RocksDB default) across all 11 column family configurations to accelerate exact-key (point) lookups by adding an O(1) hash probe inside each SST data block instead of binary search. The change is backward-compatible — existing SST files written without the hash index are still readable, and new SST files with it are readable by any RocksDB ≥ 5.13; no STORE_SCHEMA_VERSION bump is needed.

Confidence Score: 4/5

Safe to merge — no correctness or data-integrity issues; only P2 style and trade-off concerns.

All findings are P2: an inaccurate hash-ratio comment replicated across every arm, and a debatable choice to apply BinaryAndHash to sequentially-read CFs (HEADERS/BODIES) where it provides no point-lookup benefit and adds minor block-cache overhead. The change itself uses the RocksDB default value, is backward-compatible, and is functionally correct.

crates/storage/backend/rocksdb.rs — specifically the HEADERS | BODIES arm where BinaryAndHash trades cache space for a benefit that matters less for sequential reads.

Important Files Changed

Filename	Overview
crates/storage/backend/rocksdb.rs	Adds `DataBlockIndexType::BinaryAndHash` with `hash_ratio=0.75` (RocksDB default) to all CF configurations; change is backward-compatible but the inaccurate hash-ratio comments and application to range-scan-dominant CFs are minor concerns.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[CF open loop] --> B{match cf_name}
    B -->|HEADERS, BODIES| C[32KB block + BinaryAndHash 0.75 + block_cache]
    B -->|CANONICAL_BLOCK_HASHES, BLOCK_NUMBERS| D[16KB block + bloom_filter + BinaryAndHash 0.75 + block_cache]
    B -->|ACCOUNT_TRIE_NODES, STORAGE_TRIE_NODES| E[16KB block + bloom_filter + BinaryAndHash 0.75 + memtable_prefix_bloom + block_cache]
    B -->|ACCOUNT_FLATKEYVALUE, STORAGE_FLATKEYVALUE| F[16KB block + bloom_filter + BinaryAndHash 0.75 + memtable_prefix_bloom + block_cache]
    B -->|ACCOUNT_CODES| G[32KB block + blob_files + BinaryAndHash 0.75 + block_cache]
    B -->|RECEIPTS| H[32KB block + BinaryAndHash 0.75 + block_cache]
    B -->|default| I[16KB block + BinaryAndHash 0.75 + block_cache]

Prompt To Fix All With AI

This is a comment left during a code review.
Path: crates/storage/backend/rocksdb.rs
Line: 114

Comment:
**Misleading hash-ratio comment**

The comment "Hash index covers 75% of entries for good performance" is inaccurate. `data_block_hash_ratio` is not a coverage percentage; it is the ratio of hash-table slots to the number of entries in the data block. A value of `0.75` means 0.75 slots are allocated per entry, which is the RocksDB default and is perfectly fine — but the description is misleading. The ratio controls the hash-table's space budget relative to key count, not "which fraction of entries are indexed."

The same misleading comment is repeated in every match arm (including the one at line 127 that has an extra leading space: `//  Hash index for faster lookups`).

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: crates/storage/backend/rocksdb.rs
Line: 113-116

Comment:
**`BinaryAndHash` interacts poorly with range-scan dominant CFs**

`HEADERS` and `BODIES` are primarily read sequentially (e.g. during initial block download), not via exact-key point lookups. `DataBlockIndexType::BinaryAndHash` only accelerates point lookups; for sequential iteration RocksDB still uses the binary-search restart array, and the extra hash-table bytes within each 32 KB block are loaded into the block cache for no benefit. This inflates block-cache pressure for the heaviest sequential-access CFs. Consider keeping `BinarySearch` (the default) for `HEADERS | BODIES`, and apply `BinaryAndHash` only to the CFs where random point lookups dominate (`ACCOUNT_TRIE_NODES`, `STORAGE_TRIE_NODES`, `ACCOUNT_FLATKEYVALUE`, `STORAGE_FLATKEYVALUE`).

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "add BinaryAndHash data block index to al..." | Re-trigger Greptile}

…nt-lookup CFs

add BinaryAndHash data block index to all RocksDB CFs for faster poin…

34839c0

…t lookups

dicethedev requested a review from a team as a code owner April 25, 2026 09:58

greptile-apps Bot reviewed Apr 25, 2026

View reviewed changes

Comment thread crates/storage/backend/rocksdb.rs Outdated

Comment thread crates/storage/backend/rocksdb.rs Outdated

fix(l1): remove misleading comments and restrict binaryandhash to poi…

5109299

…nt-lookup CFs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(l1): add binaryandhash data block index to all rocksdb CFs for faster point lookups#6529

perf(l1): add binaryandhash data block index to all rocksdb CFs for faster point lookups#6529
dicethedev wants to merge 2 commits intolambdaclass:mainfrom
dicethedev:perf/rocksdb-data-block-hash-index

dicethedev commented Apr 25, 2026

Uh oh!

greptile-apps Bot commented Apr 25, 2026

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dicethedev commented Apr 25, 2026

Uh oh!

greptile-apps Bot commented Apr 25, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant