feat(l1): bound memory usage of healing by iovoid · Pull Request #6545 · lambdaclass/ethrex

iovoid · 2026-04-28T15:42:47Z

Motivation

For large networks storage healing might end up loading hundreds of millions of nodes into the pending queue.

Description

We use a priority queue to ensure deep nodes are resolved first instead of opening new tries. We also set a soft cap after which download goes slower to ensure parallelism doesn't cause the limit to be substantially exceeded. The limit is not hard as we prefer a risk of OOM rather than a stall.

greptile-apps · 2026-04-28T15:47:44Z

Greptile Summary

This PR bounds unbounded memory growth during snap healing for large networks by replacing Vec/VecDeque-backed download queues with BinaryHeaps ordered deepest-first, and adding a soft backpressure gate (HEALING_QUEUE_SOFT_LIMIT = 800_000) shared between storage and state healing. When the pending-parents map exceeds the limit, new download dispatch is paused until in-flight responses drain it; an escape hatch (inflight_tasks == 0 / requests.is_empty()) prevents deadlock if everything drains simultaneously. Both loops also gain an unconditional yield_now().await at the top to keep the tokio runtime cooperative during backpressure.

Confidence Score: 4/5

Safe to merge; the approach is sound and well-documented, with the only finding being a minor semantic smell in the Eq implementations.

All findings are P2 style concerns. The backpressure logic, escape hatches, and depth-first ordering are all correctly reasoned; tests cover the heap ordering and edge cases.

No files require special attention beyond the Eq semantic note in state.rs and storage.rs.

Important Files Changed

Filename	Overview
crates/networking/p2p/snap/constants.rs	Adds `HEALING_QUEUE_SOFT_LIMIT` (800,000 entries ≈ 1 GB) with detailed memory-cost derivation; clean addition with no issues.
crates/networking/p2p/sync/healing/state.rs	Replaces `Vec`-based pending paths with a `BinaryHeap<DepthOrderedMetadata>` (deepest-first) and adds backpressure gating against `HEALING_QUEUE_SOFT_LIMIT`; `PartialEq`/`Eq` defined by depth only is a semantic smell but harmless for `BinaryHeap`.
crates/networking/p2p/sync/healing/storage.rs	Replaces `VecDeque<NodeRequest>` with `BinaryHeap<DepthOrderedRequest>` and gates dispatch on `HEALING_QUEUE_SOFT_LIMIT`; same depth-equality semantic smell as state.rs, but no correctness issues for the heap use case.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Loop start — yield_now] --> B{healing_queue\n>= SOFT_LIMIT?}
    B -- No --> C[Pop batch from\nBinaryHeap max-depth]
    B -- Yes --> D{inflight_tasks == 0?}
    D -- Yes escape hatch --> C
    D -- No backpressure --> E[Increment\nbackpressure_stalls]
    C --> F[get_best_peer]
    F -- peer found --> G[tokio::spawn\nrequest_trienodes]
    F -- no peer --> H[Re-push batch\ninto heap, sleep 10ms]
    G --> I[task_receiver.try_recv]
    E --> I
    I -- Ok response --> J[heal_state_batch\ncommit_node cascades]
    I -- Err --> K[Re-push batch\ninto heap]
    J --> L[return_paths → push\nDepthOrderedMetadata into heap]
    L --> A
    K --> A
    H --> A

Comments Outside Diff (1)

crates/networking/p2p/sync/healing/state.rs, line 79-84 (link)

PartialEq defines equality as same depth, not same content

DepthOrderedMetadata::eq returns true for two wrappers with completely different RequestMetadata as long as their path.len() matches. While BinaryHeap never calls eq for its heap operations (it relies solely on Ord), the trait impl still advertises that a == b for structurally distinct values — violating the usual semantic contract of Eq. If this type is ever placed in a HashSet, used in an assert_eq!, or compared for deduplication, the result will be silently wrong.

The same pattern is mirrored in DepthOrderedRequest::eq in storage.rs. Consider either deriving PartialEq from the inner type (and accepting the asymmetry with Ord, which is permitted by Rust but should be documented) or adding an explicit comment making the intended semantics clear.

Prompt To Fix With AI

This is a comment left during a code review.
Path: crates/networking/p2p/sync/healing/state.rs
Line: 79-84

Comment:
**`PartialEq` defines equality as same depth, not same content**

`DepthOrderedMetadata::eq` returns `true` for two wrappers with completely different `RequestMetadata` as long as their `path.len()` matches. While `BinaryHeap` never calls `eq` for its heap operations (it relies solely on `Ord`), the trait impl still advertises that `a == b` for structurally distinct values — violating the usual semantic contract of `Eq`. If this type is ever placed in a `HashSet`, used in an `assert_eq!`, or compared for deduplication, the result will be silently wrong.

The same pattern is mirrored in `DepthOrderedRequest::eq` in `storage.rs`. Consider either deriving `PartialEq` from the inner type (and accepting the asymmetry with `Ord`, which is permitted by Rust but should be documented) or adding an explicit comment making the intended semantics clear.

How can I resolve this? If you propose a fix, please make it concise.

Prompt To Fix All With AI

This is a comment left during a code review.
Path: crates/networking/p2p/sync/healing/state.rs
Line: 79-84

Comment:
**`PartialEq` defines equality as same depth, not same content**

`DepthOrderedMetadata::eq` returns `true` for two wrappers with completely different `RequestMetadata` as long as their `path.len()` matches. While `BinaryHeap` never calls `eq` for its heap operations (it relies solely on `Ord`), the trait impl still advertises that `a == b` for structurally distinct values — violating the usual semantic contract of `Eq`. If this type is ever placed in a `HashSet`, used in an `assert_eq!`, or compared for deduplication, the result will be silently wrong.

The same pattern is mirrored in `DepthOrderedRequest::eq` in `storage.rs`. Consider either deriving `PartialEq` from the inner type (and accepting the asymmetry with `Ord`, which is permitted by Rust but should be documented) or adding an explicit comment making the intended semantics clear.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "fix(p2p): state healing review fixes" | Re-trigger Greptile}

…memory-bound # Conflicts: # crates/networking/p2p/sync/healing/state.rs

github-actions · 2026-04-28T15:51:47Z

Lines of code report

Total lines added: 151
Total lines removed: 0
Total lines changed: 151

Detailed view

+------------------------------------------------------+-------+------+
| File                                                 | Lines | Diff |
+------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/snap/constants.rs       | 24    | +1   |
+------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/sync/healing/state.rs   | 458   | +70  |
+------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/sync/healing/storage.rs | 696   | +80  |
+------------------------------------------------------+-------+------+

ElFantasma

Two minor comments

ElFantasma · 2026-04-30T14:38:46Z

+#[derive(Debug, Clone)]
+struct DepthOrderedMetadata(RequestMetadata);
+
+impl PartialEq for DepthOrderedMetadata {


PartialEq surprise: equality is purely on path length, so two structurally different requests at the same depth compare as equal. Fine for BinaryHeap (which only uses Ord), but if anyone later adds dedup or a HashSet<DepthOrderedMetadata> they'll get silent collisions. Either (a) doc-comment the equality as ordering-only, or (b) derive structural PartialEq/Eq and put the depth-only comparison in Ord alone — BinaryHeap doesn't require PartialEq to match Ord semantics. Same applies to DepthOrderedRequest in storage.rs.

ElFantasma · 2026-04-30T14:38:46Z

+                        .unwrap_or(None)
+                    else {
+                        // If there are no peers available, re-add the batch to the paths vector, and continue
+                        paths.extend(batch.into_iter().map(DepthOrderedMetadata));


Re-pushes a freshly-popped batch back into the heap on the no-peers path, costing O(BATCH_SIZE · log n). Acceptable for a slow path, but a peer-starved cluster eats more CPU than the old VecDeque would. Not blocking.

iovoid added 4 commits April 20, 2026 16:17

perf(p2p): bound storage healing memory

37adef7

fix(p2p): address healing review feedback

2e34fd8

perf(p2p): bound state healing memory

2029cf3

fix(p2p): state healing review fixes

d82f116

iovoid requested a review from a team as a code owner April 28, 2026 15:42

Merge remote-tracking branch 'origin/main' into perf/storage-healing-…

a63a6b6

…memory-bound # Conflicts: # crates/networking/p2p/sync/healing/state.rs

ElFantasma approved these changes Apr 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(l1): bound memory usage of healing#6545

feat(l1): bound memory usage of healing#6545
iovoid wants to merge 5 commits intomainfrom
perf/storage-healing-memory-bound

iovoid commented Apr 28, 2026

Uh oh!

greptile-apps Bot commented Apr 28, 2026 •

edited

Loading

Important Files Changed

Comments Outside Diff (1)

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

ElFantasma left a comment

Uh oh!

ElFantasma Apr 30, 2026

Uh oh!

ElFantasma Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

iovoid commented Apr 28, 2026

Uh oh!

greptile-apps Bot commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Comments Outside Diff (1)

Uh oh!

github-actions Bot commented Apr 28, 2026

Lines of code report

Uh oh!

ElFantasma left a comment

Choose a reason for hiding this comment

Uh oh!

ElFantasma Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

ElFantasma Apr 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps Bot commented Apr 28, 2026 •

edited

Loading