Conversation
Greptile SummaryThis PR bounds unbounded memory growth during snap healing for large networks by replacing Confidence Score: 4/5Safe to merge; the approach is sound and well-documented, with the only finding being a minor semantic smell in the Eq implementations. All findings are P2 style concerns. The backpressure logic, escape hatches, and depth-first ordering are all correctly reasoned; tests cover the heap ordering and edge cases. No files require special attention beyond the Eq semantic note in state.rs and storage.rs.
|
| Filename | Overview |
|---|---|
| crates/networking/p2p/snap/constants.rs | Adds HEALING_QUEUE_SOFT_LIMIT (800,000 entries ≈ 1 GB) with detailed memory-cost derivation; clean addition with no issues. |
| crates/networking/p2p/sync/healing/state.rs | Replaces Vec-based pending paths with a BinaryHeap<DepthOrderedMetadata> (deepest-first) and adds backpressure gating against HEALING_QUEUE_SOFT_LIMIT; PartialEq/Eq defined by depth only is a semantic smell but harmless for BinaryHeap. |
| crates/networking/p2p/sync/healing/storage.rs | Replaces VecDeque<NodeRequest> with BinaryHeap<DepthOrderedRequest> and gates dispatch on HEALING_QUEUE_SOFT_LIMIT; same depth-equality semantic smell as state.rs, but no correctness issues for the heap use case. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Loop start — yield_now] --> B{healing_queue\n>= SOFT_LIMIT?}
B -- No --> C[Pop batch from\nBinaryHeap max-depth]
B -- Yes --> D{inflight_tasks == 0?}
D -- Yes escape hatch --> C
D -- No backpressure --> E[Increment\nbackpressure_stalls]
C --> F[get_best_peer]
F -- peer found --> G[tokio::spawn\nrequest_trienodes]
F -- no peer --> H[Re-push batch\ninto heap, sleep 10ms]
G --> I[task_receiver.try_recv]
E --> I
I -- Ok response --> J[heal_state_batch\ncommit_node cascades]
I -- Err --> K[Re-push batch\ninto heap]
J --> L[return_paths → push\nDepthOrderedMetadata into heap]
L --> A
K --> A
H --> A
Comments Outside Diff (1)
-
crates/networking/p2p/sync/healing/state.rs, line 79-84 (link)PartialEqdefines equality as same depth, not same contentDepthOrderedMetadata::eqreturnstruefor two wrappers with completely differentRequestMetadataas long as theirpath.len()matches. WhileBinaryHeapnever callseqfor its heap operations (it relies solely onOrd), the trait impl still advertises thata == bfor structurally distinct values — violating the usual semantic contract ofEq. If this type is ever placed in aHashSet, used in anassert_eq!, or compared for deduplication, the result will be silently wrong.The same pattern is mirrored in
DepthOrderedRequest::eqinstorage.rs. Consider either derivingPartialEqfrom the inner type (and accepting the asymmetry withOrd, which is permitted by Rust but should be documented) or adding an explicit comment making the intended semantics clear.Prompt To Fix With AI
This is a comment left during a code review. Path: crates/networking/p2p/sync/healing/state.rs Line: 79-84 Comment: **`PartialEq` defines equality as same depth, not same content** `DepthOrderedMetadata::eq` returns `true` for two wrappers with completely different `RequestMetadata` as long as their `path.len()` matches. While `BinaryHeap` never calls `eq` for its heap operations (it relies solely on `Ord`), the trait impl still advertises that `a == b` for structurally distinct values — violating the usual semantic contract of `Eq`. If this type is ever placed in a `HashSet`, used in an `assert_eq!`, or compared for deduplication, the result will be silently wrong. The same pattern is mirrored in `DepthOrderedRequest::eq` in `storage.rs`. Consider either deriving `PartialEq` from the inner type (and accepting the asymmetry with `Ord`, which is permitted by Rust but should be documented) or adding an explicit comment making the intended semantics clear. How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/networking/p2p/sync/healing/state.rs
Line: 79-84
Comment:
**`PartialEq` defines equality as same depth, not same content**
`DepthOrderedMetadata::eq` returns `true` for two wrappers with completely different `RequestMetadata` as long as their `path.len()` matches. While `BinaryHeap` never calls `eq` for its heap operations (it relies solely on `Ord`), the trait impl still advertises that `a == b` for structurally distinct values — violating the usual semantic contract of `Eq`. If this type is ever placed in a `HashSet`, used in an `assert_eq!`, or compared for deduplication, the result will be silently wrong.
The same pattern is mirrored in `DepthOrderedRequest::eq` in `storage.rs`. Consider either deriving `PartialEq` from the inner type (and accepting the asymmetry with `Ord`, which is permitted by Rust but should be documented) or adding an explicit comment making the intended semantics clear.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "fix(p2p): state healing review fixes" | Re-trigger Greptile
…memory-bound # Conflicts: # crates/networking/p2p/sync/healing/state.rs
Lines of code reportTotal lines added: Detailed view |
| #[derive(Debug, Clone)] | ||
| struct DepthOrderedMetadata(RequestMetadata); | ||
|
|
||
| impl PartialEq for DepthOrderedMetadata { |
There was a problem hiding this comment.
PartialEq surprise: equality is purely on path length, so two structurally different requests at the same depth compare as equal. Fine for BinaryHeap (which only uses Ord), but if anyone later adds dedup or a HashSet<DepthOrderedMetadata> they'll get silent collisions. Either (a) doc-comment the equality as ordering-only, or (b) derive structural PartialEq/Eq and put the depth-only comparison in Ord alone — BinaryHeap doesn't require PartialEq to match Ord semantics. Same applies to DepthOrderedRequest in storage.rs.
| .unwrap_or(None) | ||
| else { | ||
| // If there are no peers available, re-add the batch to the paths vector, and continue | ||
| paths.extend(batch.into_iter().map(DepthOrderedMetadata)); |
There was a problem hiding this comment.
Re-pushes a freshly-popped batch back into the heap on the no-peers path, costing O(BATCH_SIZE · log n). Acceptable for a slow path, but a peer-starved cluster eats more CPU than the old VecDeque would. Not blocking.
Motivation
For large networks storage healing might end up loading hundreds of millions of nodes into the pending queue.
Description
We use a priority queue to ensure deep nodes are resolved first instead of opening new tries. We also set a soft cap after which download goes slower to ensure parallelism doesn't cause the limit to be substantially exceeded. The limit is not hard as we prefer a risk of OOM rather than a stall.