perf(l1): reduce BAL parallel-path overhead by edg-l · Pull Request #6543 · lambdaclass/ethrex

edg-l · 2026-04-28T13:29:16Z

Summary

Bundle of independent improvements to the BAL parallel-execution path, validated against a 149-block stress fixture (100M gas, 200–500 tx/block, ~25M-gas median blocks).

Metric (median)	Sequential	Parallel (no bundle)	Parallel + bundle	vs sequential	vs parallel-no-bundle
Ggas/s	1.78	2.88	3.64	+104.3%	+26.4%
total (ms)	23.86	14.43	11.44	−52.1%	−20.7%
exec (ms)	21.97	12.94	6.67	−69.6%	−48.5%
warmer (ms)	7.41	5.39	3.93	−47.0%	−27.1%
store (ms)	1.60	1.19	1.25	−21.9%	+5.0%

The bundle doubles the speedup margin the parallel path was already providing over sequential.

What's in the bundle

Each change is independently shippable; combined here for atomic review since they touch overlapping code in `execute_block_parallel`.

A. handle_merkleization_bal overlap fix (`blockchain.rs`) — replace channel drain-loop with single `recv()`. Stage B (parallel storage roots) now overlaps with exec instead of serializing after it.
B. Adaptive threshold `BAL_PARALLEL_TX_THRESHOLD = 5` — below threshold falls through to sequential exec (which produces a BAL during exec; `blockchain.rs` hash-compares against header). Mirrors reth's `SMALL_BLOCK_TX_THRESHOLD`.
C. import-bench inter-block sleep 500ms → 100ms (bench tooling change, no production effect) — cuts bench wall-clock 80%.
Q1. Skip prestate read in `bal_to_account_updates` when BAL covers all info fields. Two fast paths: storage-only updates and full-info-coverage with non-empty post.
Q2. Per-tx `GeneralizedDatabase` capacity cap @32 (was sized to full BAL account count, often 100s; p50 tx touches <10).
Q3. Memoize `code_from_bal` results — pre-compute Code objects (hash + jump_targets) once per BAL code change before the par_iter; pass cache via optional param to `seed_db_from_bal`.
Q8. Move per-tx BAL validation into the rayon par_iter closure — eliminates a serial post-exec validation pass; drops `current_state`/`codes` inside the closure (no longer cross rayon boundary).
DashMap swap in `CachingDatabase` — perf record showed 11% of CPU in `RwLock::read_contended` with 16 rayon workers hammering the single account RwLock. Replaced with sharded `DashMap<_, _, FxBuildHasher>`. Sequential paths unaffected (only 2 threads, weren't contended).

Effect on non-BAL paths

Block production / pre-Amsterdam / sequential fallback: DashMap is neutral (low contention); threshold-fallback adds a protective branch; other changes only fire on the BAL parallel-validation path.
No regressions in non-parallel paths.

Tried-and-rejected (documented for context)

Drop `accessed_accounts` tracker: not actually redundant — superset/subset of shadow recorder, distinct correctness roles.
`rayon::join` warmer Phase 2 + Phase 3: nested rayon on shared pool starved exec workers (−12%), warmer didn't speed up (already I/O-bound saturating internal par_iter).
Validation-only BAL recorder: exec saved 5%, but those savings shifted to "after exec" merkle drain — net per-block flat. Once exec < merkle wall-clock, exec-side savings have diminishing returns on per-block time.

Test plan

`cargo check -p ethrex-blockchain -p ethrex-levm -p ethrex-vm` (clean)
Stress fixture (149 blocks, 100M gas, mainnet-shape): per-block medians match the table above
Hive Amsterdam consume-engine
EF blockchain tests (BAL fixtures `bal@v6.0.0`)

github-actions · 2026-04-28T13:32:00Z

Lines of code report

Total lines added: 69
Total lines removed: 34
Total lines changed: 103

Detailed view

+----------------------------------------+-------+------+
| File                                   | Lines | Diff |
+----------------------------------------+-------+------+
| ethrex/crates/blockchain/blockchain.rs | 2482  | -5   |
+----------------------------------------+-------+------+
| ethrex/crates/vm/backends/levm/mod.rs  | 2426  | +69  |
+----------------------------------------+-------+------+
| ethrex/crates/vm/levm/src/db/mod.rs    | 119   | -29  |
+----------------------------------------+-------+------+

Bundle of independent improvements to the BAL parallel-execution path (execute_block_parallel + handle_merkleization_bal + warm_block_from_bal + CachingDatabase), validated against a 149-block stress fixture (100M gas, 200-500 tx/block, ~25M-gas median blocks). Headline (per-block medians): Metric Sequential Parallel(no bundle) + bundle vs seq vs par-base Ggas/s 1.78 2.88 3.64 +104.3% +26.4% total (ms) 23.86 14.43 11.44 -52.1% -20.7% exec (ms) 21.97 12.94 6.67 -69.6% -48.5% warmer (ms) 7.41 5.39 3.93 -47.0% -27.1% store (ms) 1.60 1.19 1.25 -21.9% +5.0% Bundle doubles the speedup margin the parallel path was already providing over sequential. The changes (each is independently shippable; combined here for atomic review since they touch overlapping code): A. handle_merkleization_bal overlap fix (crates/blockchain/blockchain.rs) `for updates in rx { ... }` blocked until channel close (= exec end). execute_block_parallel sends exactly one batch up front from bal_to_account_updates, so draining nothing useful serialized Stage B (parallel storage roots) after exec instead of overlapping with it. Replaced with a single rx.recv() and dropped the FxHashMap merge step (BAL guarantees one entry per address). B. Adaptive threshold for BAL parallel exec (crates/vm/backends/levm/mod.rs) Added BAL_PARALLEL_TX_THRESHOLD = 5. Below threshold falls through to the sequential path which produces a BAL during exec; blockchain.rs hash-compares produced vs header BAL — same correctness, no parallel constants. Mirrors reth's SMALL_BLOCK_TX_THRESHOLD; trips on <1% of mainnet blocks (100-block sample). C. import-bench inter-block sleep 500ms -> 100ms (cmd/ethrex/cli.rs) Bench tooling change. The sleep gates background trie-layer writeback from bleeding into the next block's per-block timer; 100ms is well above measured Phase 2 cost on SSD. Cuts bench wall clock 80% without affecting the per-block metric. NO effect on production paths. Q1. Skip prestate read in bal_to_account_updates when BAL covers all info fields (crates/vm/backends/levm/mod.rs). Two fast paths added: storage-only updates (info: None, removed: false by construction); full info coverage with non-empty post (removal impossible, info from BAL alone). Slow path keeps existing behavior for partial coverage. Q2. Per-tx GeneralizedDatabase capacity cap at 32 (crates/vm/backends/levm/mod.rs::execute_block_parallel). Previously sized to bal.accounts().len() (often 100s on stress blocks); p50 tx touches <10 accounts. Reduced allocator pressure across rayon workers. Q3. Memoize code_from_bal results across seed_db_from_bal calls (crates/vm/backends/levm/mod.rs). Pre-compute Code objects (hash + jump_targets) once per BAL code change before the par_iter; pass cache via optional param to seed_db_from_bal. Saves N-1 keccak+jump-target scans per code change per block (N = tx count). Q8. Move per-tx BAL validation into the rayon par_iter closure (crates/vm/backends/levm/mod.rs::execute_block_parallel). Eliminates a serial post-exec validation pass (~3 ms median across 200 txs). Drops current_state and codes inside the closure after validation runs — they no longer cross the rayon boundary, reducing per-tx allocator pressure. Closure returns deferred Option<EvmError> so gas-limit check still takes priority over BAL mismatch errors. DashMap. CachingDatabase RwLock<HashMap> -> DashMap<_, _, FxBuildHasher> (crates/vm/levm/src/db/mod.rs). Found via perf record: 11% of CPU was RwLock::read_contended on the single account RwLock with 16 rayon workers hammering it. Sharded concurrent map (64 default shards) eliminates contention. Sequential paths unaffected (only 2 threads access the cache, weren't contended). Effect on non-BAL paths (block production, pre-Amsterdam, sequential fallback): DashMap is neutral (low contention), threshold-fallback adds a protective branch, other changes only fire on the BAL parallel-validation path. No regressions in non-parallel paths.

github-actions · 2026-04-28T13:47:08Z

Benchmark Results Comparison

Benchmark Results: MstoreBench

Command	Mean [s]	Min [s]	Max [s]	Relative
`main_revm_MstoreBench`	262.0 ± 4.6	258.5	274.8	1.14 ± 0.02
`main_levm_MstoreBench`	266.8 ± 101.4	231.0	555.1	1.16 ± 0.44
`pr_levm_MstoreBench`	230.4 ± 1.2	228.5	232.7	1.00

Detailed Results

Benchmark Results: BubbleSort

Command	Mean [s]	Min [s]	Max [s]	Relative
`main_revm_BubbleSort`	3.017 ± 0.020	2.985	3.049	1.12 ± 0.02
`main_levm_BubbleSort`	2.696 ± 0.033	2.672	2.784	1.00 ± 0.02
`pr_revm_BubbleSort`	3.000 ± 0.019	2.971	3.023	1.11 ± 0.02
`pr_levm_BubbleSort`	2.694 ± 0.042	2.664	2.802	1.00

Benchmark Results: ERC20Approval

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`main_revm_ERC20Approval`	988.2 ± 5.9	981.7	1000.8	1.01 ± 0.01
`main_levm_ERC20Approval`	1022.2 ± 8.3	1010.8	1037.0	1.04 ± 0.01
`pr_revm_ERC20Approval`	982.7 ± 8.4	975.0	1000.5	1.00
`pr_levm_ERC20Approval`	1027.1 ± 16.2	1008.2	1063.7	1.05 ± 0.02

Benchmark Results: ERC20Mint

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`main_revm_ERC20Mint`	135.0 ± 0.8	133.8	136.2	1.01 ± 0.01
`main_levm_ERC20Mint`	148.3 ± 0.5	147.8	149.0	1.11 ± 0.01
`pr_revm_ERC20Mint`	133.8 ± 0.5	132.6	134.4	1.00
`pr_levm_ERC20Mint`	147.9 ± 0.6	147.1	148.8	1.11 ± 0.01

Benchmark Results: ERC20Transfer

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`main_revm_ERC20Transfer`	234.2 ± 1.9	232.4	237.9	1.01 ± 0.01
`main_levm_ERC20Transfer`	252.4 ± 1.0	250.8	253.9	1.09 ± 0.01
`pr_revm_ERC20Transfer`	231.5 ± 1.1	229.9	233.0	1.00
`pr_levm_ERC20Transfer`	251.2 ± 2.1	248.7	255.9	1.09 ± 0.01

Benchmark Results: Factorial

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`main_revm_Factorial`	223.5 ± 0.9	221.6	224.4	1.00
`main_levm_Factorial`	250.3 ± 2.3	247.6	254.6	1.12 ± 0.01
`pr_revm_Factorial`	224.7 ± 1.5	223.7	228.9	1.01 ± 0.01
`pr_levm_Factorial`	247.5 ± 2.4	244.8	253.4	1.11 ± 0.01

Benchmark Results: FactorialRecursive

Command	Mean [s]	Min [s]	Max [s]	Relative
`main_revm_FactorialRecursive`	1.621 ± 0.029	1.571	1.662	1.01 ± 0.02
`main_levm_FactorialRecursive`	9.135 ± 0.057	9.040	9.235	5.72 ± 0.09
`pr_revm_FactorialRecursive`	1.597 ± 0.024	1.561	1.642	1.00
`pr_levm_FactorialRecursive`	9.211 ± 0.026	9.167	9.256	5.77 ± 0.09

Benchmark Results: Fibonacci

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`main_revm_Fibonacci`	205.0 ± 1.3	203.7	207.4	1.00 ± 0.01
`main_levm_Fibonacci`	233.5 ± 6.2	225.0	245.9	1.14 ± 0.03
`pr_revm_Fibonacci`	204.8 ± 1.1	203.5	207.0	1.00
`pr_levm_Fibonacci`	225.3 ± 4.3	220.1	232.8	1.10 ± 0.02

Benchmark Results: FibonacciRecursive

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`main_revm_FibonacciRecursive`	837.6 ± 7.4	824.5	848.2	1.34 ± 0.02
`main_levm_FibonacciRecursive`	624.6 ± 9.5	609.7	642.9	1.00 ± 0.02
`pr_revm_FibonacciRecursive`	845.3 ± 17.3	825.1	878.7	1.35 ± 0.03
`pr_levm_FibonacciRecursive`	624.3 ± 7.8	615.5	642.0	1.00

Benchmark Results: ManyHashes

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`main_revm_ManyHashes`	8.4 ± 0.1	8.3	8.6	1.02 ± 0.01
`main_levm_ManyHashes`	9.8 ± 0.1	9.7	9.9	1.19 ± 0.01
`pr_revm_ManyHashes`	8.3 ± 0.0	8.2	8.3	1.00
`pr_levm_ManyHashes`	9.7 ± 0.1	9.7	9.8	1.17 ± 0.01

Benchmark Results: MstoreBench

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`main_revm_MstoreBench`	262.0 ± 4.6	258.5	274.8	1.14 ± 0.02
`main_levm_MstoreBench`	266.8 ± 101.4	231.0	555.1	1.16 ± 0.44
`pr_revm_MstoreBench`	263.6 ± 4.4	260.1	272.8	1.14 ± 0.02
`pr_levm_MstoreBench`	230.4 ± 1.2	228.5	232.7	1.00

Benchmark Results: Push

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`main_revm_Push`	288.4 ± 1.4	286.9	290.9	1.02 ± 0.01
`main_levm_Push`	284.1 ± 1.5	282.0	286.2	1.00
`pr_revm_Push`	289.0 ± 0.8	288.1	290.4	1.02 ± 0.01
`pr_levm_Push`	285.9 ± 4.5	282.6	296.5	1.01 ± 0.02

Benchmark Results: SstoreBench_no_opt

Command	Mean [ms]	Min [ms]	Max [ms]	Relative
`main_revm_SstoreBench_no_opt`	171.4 ± 2.5	167.3	175.4	1.72 ± 0.03
`main_levm_SstoreBench_no_opt`	99.7 ± 0.4	99.2	100.5	1.00
`pr_revm_SstoreBench_no_opt`	172.9 ± 3.1	170.6	181.4	1.73 ± 0.03
`pr_levm_SstoreBench_no_opt`	99.8 ± 0.3	99.1	100.2	1.00 ± 0.01

github-actions · 2026-04-28T14:24:15Z

Benchmark Block Execution Results Comparison Against Main

Command	Mean [s]	Min [s]	Max [s]	Relative
`base`	65.977 ± 0.197	65.739	66.401	1.00 ± 0.00
`head`	65.784 ± 0.105	65.599	65.913	1.00

github-actions Bot assigned edg-l Apr 28, 2026

github-actions Bot added L1 Ethereum client performance Block execution throughput and performance in general labels Apr 28, 2026

github-project-automation Bot added this to ethrex_l1 and ethrex_performance Apr 28, 2026

edg-l force-pushed the perf/bal-parallel-overhead branch from 203f859 to 1e3ac87 Compare April 28, 2026 13:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(l1): reduce BAL parallel-path overhead#6543

perf(l1): reduce BAL parallel-path overhead#6543
edg-l wants to merge 1 commit intobal-devnet-4from
perf/bal-parallel-overhead

edg-l commented Apr 28, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026 •

edited

Loading

Benchmark Results: BubbleSort

Benchmark Results: ERC20Approval

Benchmark Results: ERC20Mint

Benchmark Results: ERC20Transfer

Benchmark Results: Factorial

Benchmark Results: FactorialRecursive

Benchmark Results: Fibonacci

Benchmark Results: FibonacciRecursive

Benchmark Results: ManyHashes

Benchmark Results: MstoreBench

Benchmark Results: Push

Benchmark Results: SstoreBench_no_opt

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

edg-l commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's in the bundle

Effect on non-BAL paths

Tried-and-rejected (documented for context)

Test plan

Uh oh!

github-actions Bot commented Apr 28, 2026

Lines of code report

Uh oh!

github-actions Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Results Comparison

Benchmark Results: MstoreBench

Benchmark Results: BubbleSort

Benchmark Results: ERC20Approval

Benchmark Results: ERC20Mint

Benchmark Results: ERC20Transfer

Benchmark Results: Factorial

Benchmark Results: FactorialRecursive

Benchmark Results: Fibonacci

Benchmark Results: FibonacciRecursive

Benchmark Results: ManyHashes

Benchmark Results: MstoreBench

Benchmark Results: Push

Benchmark Results: SstoreBench_no_opt

Uh oh!

github-actions Bot commented Apr 28, 2026

Benchmark Block Execution Results Comparison Against Main

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

edg-l commented Apr 28, 2026 •

edited

Loading

github-actions Bot commented Apr 28, 2026 •

edited

Loading