Skip to content

feat(l1): bound memory usage of healing#6545

Open
iovoid wants to merge 5 commits intomainfrom
perf/storage-healing-memory-bound
Open

feat(l1): bound memory usage of healing#6545
iovoid wants to merge 5 commits intomainfrom
perf/storage-healing-memory-bound

Conversation

@iovoid
Copy link
Copy Markdown
Contributor

@iovoid iovoid commented Apr 28, 2026

Motivation

For large networks storage healing might end up loading hundreds of millions of nodes into the pending queue.

Description

We use a priority queue to ensure deep nodes are resolved first instead of opening new tries. We also set a soft cap after which download goes slower to ensure parallelism doesn't cause the limit to be substantially exceeded. The limit is not hard as we prefer a risk of OOM rather than a stall.

@iovoid iovoid requested a review from a team as a code owner April 28, 2026 15:42
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 28, 2026

Greptile Summary

This PR bounds unbounded memory growth during snap healing for large networks by replacing Vec/VecDeque-backed download queues with BinaryHeaps ordered deepest-first, and adding a soft backpressure gate (HEALING_QUEUE_SOFT_LIMIT = 800_000) shared between storage and state healing. When the pending-parents map exceeds the limit, new download dispatch is paused until in-flight responses drain it; an escape hatch (inflight_tasks == 0 / requests.is_empty()) prevents deadlock if everything drains simultaneously. Both loops also gain an unconditional yield_now().await at the top to keep the tokio runtime cooperative during backpressure.

Confidence Score: 4/5

Safe to merge; the approach is sound and well-documented, with the only finding being a minor semantic smell in the Eq implementations.

All findings are P2 style concerns. The backpressure logic, escape hatches, and depth-first ordering are all correctly reasoned; tests cover the heap ordering and edge cases.

No files require special attention beyond the Eq semantic note in state.rs and storage.rs.

Important Files Changed

Filename Overview
crates/networking/p2p/snap/constants.rs Adds HEALING_QUEUE_SOFT_LIMIT (800,000 entries ≈ 1 GB) with detailed memory-cost derivation; clean addition with no issues.
crates/networking/p2p/sync/healing/state.rs Replaces Vec-based pending paths with a BinaryHeap<DepthOrderedMetadata> (deepest-first) and adds backpressure gating against HEALING_QUEUE_SOFT_LIMIT; PartialEq/Eq defined by depth only is a semantic smell but harmless for BinaryHeap.
crates/networking/p2p/sync/healing/storage.rs Replaces VecDeque<NodeRequest> with BinaryHeap<DepthOrderedRequest> and gates dispatch on HEALING_QUEUE_SOFT_LIMIT; same depth-equality semantic smell as state.rs, but no correctness issues for the heap use case.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Loop start — yield_now] --> B{healing_queue\n>= SOFT_LIMIT?}
    B -- No --> C[Pop batch from\nBinaryHeap max-depth]
    B -- Yes --> D{inflight_tasks == 0?}
    D -- Yes escape hatch --> C
    D -- No backpressure --> E[Increment\nbackpressure_stalls]
    C --> F[get_best_peer]
    F -- peer found --> G[tokio::spawn\nrequest_trienodes]
    F -- no peer --> H[Re-push batch\ninto heap, sleep 10ms]
    G --> I[task_receiver.try_recv]
    E --> I
    I -- Ok response --> J[heal_state_batch\ncommit_node cascades]
    I -- Err --> K[Re-push batch\ninto heap]
    J --> L[return_paths → push\nDepthOrderedMetadata into heap]
    L --> A
    K --> A
    H --> A
Loading

Comments Outside Diff (1)

  1. crates/networking/p2p/sync/healing/state.rs, line 79-84 (link)

    P2 PartialEq defines equality as same depth, not same content

    DepthOrderedMetadata::eq returns true for two wrappers with completely different RequestMetadata as long as their path.len() matches. While BinaryHeap never calls eq for its heap operations (it relies solely on Ord), the trait impl still advertises that a == b for structurally distinct values — violating the usual semantic contract of Eq. If this type is ever placed in a HashSet, used in an assert_eq!, or compared for deduplication, the result will be silently wrong.

    The same pattern is mirrored in DepthOrderedRequest::eq in storage.rs. Consider either deriving PartialEq from the inner type (and accepting the asymmetry with Ord, which is permitted by Rust but should be documented) or adding an explicit comment making the intended semantics clear.

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: crates/networking/p2p/sync/healing/state.rs
    Line: 79-84
    
    Comment:
    **`PartialEq` defines equality as same depth, not same content**
    
    `DepthOrderedMetadata::eq` returns `true` for two wrappers with completely different `RequestMetadata` as long as their `path.len()` matches. While `BinaryHeap` never calls `eq` for its heap operations (it relies solely on `Ord`), the trait impl still advertises that `a == b` for structurally distinct values — violating the usual semantic contract of `Eq`. If this type is ever placed in a `HashSet`, used in an `assert_eq!`, or compared for deduplication, the result will be silently wrong.
    
    The same pattern is mirrored in `DepthOrderedRequest::eq` in `storage.rs`. Consider either deriving `PartialEq` from the inner type (and accepting the asymmetry with `Ord`, which is permitted by Rust but should be documented) or adding an explicit comment making the intended semantics clear.
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/networking/p2p/sync/healing/state.rs
Line: 79-84

Comment:
**`PartialEq` defines equality as same depth, not same content**

`DepthOrderedMetadata::eq` returns `true` for two wrappers with completely different `RequestMetadata` as long as their `path.len()` matches. While `BinaryHeap` never calls `eq` for its heap operations (it relies solely on `Ord`), the trait impl still advertises that `a == b` for structurally distinct values — violating the usual semantic contract of `Eq`. If this type is ever placed in a `HashSet`, used in an `assert_eq!`, or compared for deduplication, the result will be silently wrong.

The same pattern is mirrored in `DepthOrderedRequest::eq` in `storage.rs`. Consider either deriving `PartialEq` from the inner type (and accepting the asymmetry with `Ord`, which is permitted by Rust but should be documented) or adding an explicit comment making the intended semantics clear.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(p2p): state healing review fixes" | Re-trigger Greptile

…memory-bound

# Conflicts:
#	crates/networking/p2p/sync/healing/state.rs
@github-actions
Copy link
Copy Markdown

Lines of code report

Total lines added: 151
Total lines removed: 0
Total lines changed: 151

Detailed view
+------------------------------------------------------+-------+------+
| File                                                 | Lines | Diff |
+------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/snap/constants.rs       | 24    | +1   |
+------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/sync/healing/state.rs   | 458   | +70  |
+------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/sync/healing/storage.rs | 696   | +80  |
+------------------------------------------------------+-------+------+

Copy link
Copy Markdown
Contributor

@ElFantasma ElFantasma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor comments

#[derive(Debug, Clone)]
struct DepthOrderedMetadata(RequestMetadata);

impl PartialEq for DepthOrderedMetadata {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PartialEq surprise: equality is purely on path length, so two structurally different requests at the same depth compare as equal. Fine for BinaryHeap (which only uses Ord), but if anyone later adds dedup or a HashSet<DepthOrderedMetadata> they'll get silent collisions. Either (a) doc-comment the equality as ordering-only, or (b) derive structural PartialEq/Eq and put the depth-only comparison in Ord alone — BinaryHeap doesn't require PartialEq to match Ord semantics. Same applies to DepthOrderedRequest in storage.rs.

.unwrap_or(None)
else {
// If there are no peers available, re-add the batch to the paths vector, and continue
paths.extend(batch.into_iter().map(DepthOrderedMetadata));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-pushes a freshly-popped batch back into the heap on the no-peers path, costing O(BATCH_SIZE · log n). Acceptable for a slow path, but a peer-starved cluster eats more CPU than the old VecDeque would. Not blocking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants