Skip to content

fix: prevent discovery eviction from disconnecting active sync peer#12097

Open
DarkLord017 wants to merge 17 commits into
masterfrom
sync_fixes
Open

fix: prevent discovery eviction from disconnecting active sync peer#12097
DarkLord017 wants to merge 17 commits into
masterfrom
sync_fixes

Conversation

@DarkLord017

@DarkLord017 DarkLord017 commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Fixes
Engine Withdrawals
✖ Sync after 128 blocks - Withdrawals on Block 2 - Multiple Withdrawal Accounts (Paris) (nethermind)
✖ Sync after 2 blocks - Withdrawals on Block 1 - Single Withdrawal Account (Paris) (nethermind)
✖ Sync after 2 blocks - Withdrawals on Block 1 - Single Withdrawal Account - No Transactions (Paris) (nethermind)
✖ Sync after 2 blocks - Withdrawals on Block 2 - Multiple Withdrawal Accounts (Paris) (nethermind)
✖ Sync after 2 blocks - Withdrawals on Genesis - Single Withdrawal Account (Paris) (nethermind)
Sync
✖ snapsync/sync nethermind from nethermind
✖ sync/sync nethermind from nethermind

Changes

  • PeerPool: Do not remove a peer from the pool when the Kademlia discovery layer evicts it if an active P2P session (inbound or outbound) is still open.

Explanation

discv4 periodically pings known nodes to keep the Kademlia routing table healthy. The ping times out (ping timing out is confirmed but unable to debug the reason) and NodeHealthTracker.OnRequestFailed removes the node from the K-bucket, which fires INodeSource.NodeRemoved. Also as there is a 5 minute timeout set and hive has a timeout of 60 seconds the test fails

Types of changes

  • Bugfix (a non-breaking change that fixes an issue)

Testing

Requires testing

  • Yes

If yes, did you write tests?

  • Yes
  • No

Notes on testing

Verified against Hive sync test suite. The peer disconnect during sync no longer occurs

Documentation

Requires documentation update

  • No

Requires explanation in Release Notes

  • No

@claude

claude Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Claude finished @DarkLord017's task in 4m 23s —— View job


Code Review

  • Gather context and read changed files
  • Read relevant rules and coding standards
  • Analyze PeerPool.cs changes
  • Analyze RlpxHost.cs changes
  • Post inline comments for specific issues
  • Post final review summary

The fix is logically sound and the root cause analysis in the PR description is accurate. Three issues need addressing before merge.

Summary

Severity Count Issues
🔴 Critical 0
🟠 High 1 Broken existing tests
🟡 Medium 2 Stale interface docs; TOCTOU race
🔵 Low 1 Discarded return value

🟠 High — Existing tests broken by ShouldContact change

RlpxHostIntegrationTests.ShouldContact_FiltersCorrectly has two cases that depend on the old TryAccept (stamping) behaviour of ShouldContact. With CanAccept (read-only), neither call stamps the filter slot, so both calls return true — contradicting the secondExpected = false assertions:

  • "Exact match: blocks same IP"(true, false, null, "203.0.113.1", "203.0.113.1", false)
  • "Subnet bucketing: blocks same subnet"(true, true, null, "203.0.113.1", "203.0.113.50", false)

These two test cases should be updated to secondExpected = true to match the new read-only contract, and new tests should cover that the slot is stamped at the InitializeChannel level for outbound connections. (Inline comment on RlpxHost.cs:110.)


🟡 Medium — IRlpxHost.ShouldContact interface documentation is stale

IRlpxHost.cs (not in the diff) still says:

"records the evaluation in the underlying contact filter"
"Calling this method may update internal state and affect the outcome of future calls"

Both statements are now false. The docs should be updated to describe the read-only semantics and point callers at InitializeChannel for where stamping actually occurs.


🟡 Medium — TOCTOU race in NodeSourceOnNodeRemoved

The session-guard check and the subsequent TryRemove are not atomic, and the InSession/OutSession fields are non-volatile. Two races exist:

  1. Stale read: the discovery thread may see null for InSession even though another thread has already assigned a session (no memory-ordering guarantee).
  2. TOCTOU: a session is assigned between the null-check passing and TryRemove executing, causing TryRemove to disconnect an active peer.

The worst-case outcome of either race is unchanged from the original behaviour (unconditional disconnect), so the fix is still strictly better. A short comment documenting the known race boundary would prevent a future reviewer from "fixing" it incorrectly. (Inline comment on PeerPool.cs:79.)


🔵 Low — TryAccept return value silently discarded in InitializeChannel

_nodeFilter.TryAccept(outboundEndpoint.Address) at RlpxHost.cs:297 discards its return value without _ =. The rest of this file uses the _ = convention for intentional discards (_ = channel.CloseAsync(), _ = connectTask.ContinueWith(...)). Using _ = _nodeFilter.TryAccept(...) would be consistent, avoids a potential CA1806 analyser hit, and signals intent. (Inline comment on RlpxHost.cs:297.)


Correctness notes (no action needed)

  • PeerPool.NodeSourceOnNodeRemoved guard logic is correct. Static/bootnode peers with InSession or OutSession are protected from discovery eviction, which is the intent.
  • ShouldRejectInbound docstring was correctly updated in this PR to describe the new asymmetric flow (inbound filtered at accept time, outbound at InitializeChannel).
  • No regression test for the primary fix: the author notes Hive testing was used. A unit test for NodeSourceOnNodeRemoved guarding an active session would prevent future regressions, but this is not blocking given Hive coverage.

Comment thread src/Nethermind/Nethermind.Network/Rlpx/RlpxHost.cs Outdated
Comment thread src/Nethermind/Nethermind.Network/Rlpx/RlpxHost.cs Outdated
Comment thread src/Nethermind/Nethermind.Network/PeerPool.cs Outdated
@DarkLord017

Copy link
Copy Markdown
Contributor Author

@claude re-review

@claude

claude Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Claude finished @DarkLord017's task in 5m 25s —— View job


Re-Review

  • Gather context and read changed files
  • Check if previous review issues were addressed
  • Analyze current state of all changed files
  • Post updated review findings

Previous findings — all resolved ✅

Severity Finding Status
🟠 High ShouldContact test breakage (broken secondExpected assertions) ✅ Resolved — RlpxHost changes reverted in commit 7d11330
🟡 Medium IRlpxHost.ShouldContact interface docs stale ✅ Resolved — revert
🟡 Medium TOCTOU race in NodeSourceOnNodeRemoved ✅ Resolved — volatile backing fields added to Peer.cs; lock(peer) in both NodeSourceOnNodeRemoved and AttachSession
🔵 Low TryAccept return value silently discarded ✅ Resolved — revert

New findings

Severity Finding
🟡 Medium Remaining narrow TOCTOU — peer orphaned in ActivePeers but not Peers
🔵 Low Dead code TryRemove in the !TryGetValue branch

🟡 Medium — Remaining narrow TOCTOU (PeerPool.cs:86)

The lock(peer) in NodeSourceOnNodeRemoved and AttachSession correctly serialize the common case but leave one residual window:

  1. NodeSourceOnNodeRemoved reads both sessions as null, then acquires lock(peer)
  2. AttachSession (another thread) tries lock(peer) and blocks — it hasn't stamped a session yet
  3. NodeSourceOnNodeRemoved sees both null → calls TryRemove → removes peer from Peers; fires PeerRemoved (which removes from ActivePeers)
  4. AttachSession unblocks, sets peer.InSession = session
  5. AddActivePeer puts the peer back in ActivePeers

The peer ends up in ActivePeers with a live session (sync works) but absent from Peers (the candidate pool). When the session ends it won't be in the reconnection pool until discovery re-adds it. This is strictly better than the original unconditional disconnect; a brief comment at the lock site would prevent future regressions. Inline suggestion posted.


🔵 Low — Dead code TryRemove in !TryGetValue branch (PeerPool.cs:81)

If Peers.TryGetValue returns false, the subsequent TryRemove will also fail — the peer is not in Peers for either call. In the extremely unlikely race where another thread calls GetOrAdd between the two calls, this would spuriously evict a freshly-added peer. Should be an early return. Inline suggestion posted.


Correctness notes (no action needed)

  • volatile fields in Peer.cs are the right fix for the stale-read half of the original TOCTOU. With volatile + the lock fence in NodeSourceOnNodeRemoved, the session check sees the latest value written by any thread.
  • PeerRemoved fired inside lock(peer) (TryRemovePeerRemovedPeerPoolOnPeerRemoved): the handler only sets IsAwaitingConnection = false and calls ActivePeers.TryRemove — neither acquires lock(peer), so no deadlock today. Would be worth noting as a latent risk if new handlers are added in future.
  • DeactivatePeerIfDisconnected sets InSession/OutSession to null without holding lock(peer). Combined with volatile, writes are immediately visible across threads; the lock fence inside NodeSourceOnNodeRemoved ensures it reads the latest value, so this is safe under the current logic.

Comment thread src/Nethermind/Nethermind.Network/PeerPool.cs Outdated
Comment thread src/Nethermind/Nethermind.Network/PeerPool.cs
@DarkLord017

Copy link
Copy Markdown
Contributor Author

@claude re-review

@claude

claude Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Claude finished @DarkLord017's task in 3m 11s —— View job


Re-Review (Round 3)

  • Gather context and read changed files
  • Check if previous review issues were addressed
  • Analyze current state of all changed files
  • Post updated review findings

Previous findings

Severity Finding Status
🟡 Medium Residual TOCTOU — TryRemove outside lock(peer), no comment ⚠️ Still open — comment not added
🔵 Low Dead code TryRemove in !TryGetValue branch ✅ Resolved — early return now in place

Open finding

🟡 Medium — TryRemove still called outside lock(peer) (PeerPool.cs:92)

Carrying forward from Round 2. The lock(peer) at line 85 serializes the session null-check against AttachSession, but TryRemove at line 92 executes after the lock is released. The race window:

  1. NodeSourceOnNodeRemoved acquires lock(peer), confirms sessions are null and IsAwaitingConnection is false → releases lock
  2. AttachSession (another thread) acquires lock(peer), sets peer.InSession = session → releases lock
  3. NodeSourceOnNodeRemoved calls TryRemoveMarkDisconnected fires on the now-live session — same result as the original bug

This is still strictly better than the pre-fix unconditional eviction (the window is extremely narrow), but a comment at the lock site would prevent a future reader from "simplifying" the guard away. The suggested comment from Round 2 was not applied. Inline suggestion re-posted.


Correctness notes (no action needed)

  • volatile backing fields (Peer.cs) remain correct — InSession/OutSession writes are immediately visible across threads.
  • IsAwaitingConnection guard is a good addition from this iteration. It prevents eviction during the window between IsAwaitingConnection = true and session assignment. Note that IsAwaitingConnection is written without lock(peer), but this is pre-existing behavior consistent with the rest of the codebase and not a new risk.
  • sender is not IStaticNodesManager and not ITrustedNodesManager correctly allows explicit admin removal of static/trusted nodes to still disconnect sessions — intentional behavior.
  • lock(peer) in AttachSession (PeerManager.cs:1078) is in place and correctly serializes session assignment against discovery eviction.
  • No regression test: still unaddressed, still not blocking given Hive coverage.

Comment thread src/Nethermind/Nethermind.Network/PeerPool.cs
@DarkLord017

Copy link
Copy Markdown
Contributor Author

review @claude

@claude

claude Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Claude finished @DarkLord017's task in 4m 36s —— View job


Re-Review (Round 4)

  • Load tools
  • Read current state of changed files
  • Check previous findings status
  • Analyze and post findings

Status of Prior Findings

Round Severity Finding Status
1 🟠 High ShouldContact test breakage ✅ Resolved — RlpxHost changes reverted
1 🟡 Medium IRlpxHost.ShouldContact interface docs stale ✅ Resolved — revert
1 🟡 Medium TOCTOU + non-volatile reads ✅ Resolved — volatile fields + lock(peer) in AttachSession
1 🔵 Low TryAccept return value silently discarded ✅ Resolved — revert
2 🟡 Medium Residual TOCTOU — peer orphaned in ActivePeers but not Peers ✅ Functionally resolved — with TryRemove now inside lock(peer), the only residual case is the peer missing from the reconnection pool, not a live-session disconnect
2 🔵 Low Dead code TryRemove in !TryGetValue branch ✅ Resolved — early return in place
3 🟡 Medium TryRemove called outside lock(peer) Resolved in this batchTryRemove is now inside the lock(peer) block (PeerPool.cs:89)
3 🟡 Medium Comment documenting residual race not added ⚠️ Still open

Remaining Finding

🟡 Medium (carried from Round 2) — Comment documenting residual race still absent

This has been requested in every round. The critical code fix is now in place (Round 3's core concern resolved). However, the lock(peer) site still has no comment explaining why the lock, the session checks, and TryRemove must be colocated and what the residual race boundary is.

Without this comment a future reader could plausibly:

  1. Extract TryRemove outside the lock ("simplification")
  2. Remove the lock as "unnecessary" since volatile already ensures visibility

The residual TOCTOU (TryGetValue outside the lock) does not cause a session disconnect — with volatile fields, if AttachSession stamps a session between TryGetValue and lock acquisition, the check inside the lock will see it and bail out. The worst case is the peer orphaned from the reconnection pool for one connection cycle. Still, the explanation should live in the code.

Suggested addition at PeerPool.cs:84:

            // Lock peer to serialize session check + removal against AttachSession on another thread.
            // Both volatile reads inside and the TryRemove call must be under the same lock to close
            // the check-then-act window. Residual race: TryGetValue is outside the lock, so a brand-new
            // session assigned between TryGetValue and lock acquisition can still slip through — but TryRemove
            // will then call MarkDisconnected on null sessions (no-op), leaving the peer alive in ActivePeers
            // while absent from Peers until rediscovery. This is strictly better than the original
            // unconditional disconnect.
            lock (peer)

Correctness Notes (no action needed)

  • Round 3's core fix confirmed: TryRemove is now at PeerPool.cs:89, inside lock(peer). AttachSession (PeerManager.cs:1078) also holds lock(peer) when writing sessions. The check-then-remove is now atomic w.r.t. session assignment. ✅
  • volatile backing fields (Peer.cs:33-34) ensure the in-lock read sees the latest session value written by any thread. ✅
  • IsAwaitingConnection guard prevents eviction of peers between IsAwaitingConnection = true and session assignment. ✅
  • Static/trusted bypass (PeerPool.cs:94) correctly allows explicit admin removal to disconnect even active peers. ✅
  • PeerRemoved fires inside lock(peer) (via TryRemove at line 89): current PeerPoolOnPeerRemoved handler only sets IsAwaitingConnection = false and calls ActivePeers.TryRemove — neither reacquires lock(peer), so no deadlock today. Latent risk if new handlers are added.
  • No regression test for the primary fix: still unaddressed, still not blocking given Hive coverage.

Comment thread src/Nethermind/Nethermind.Network/PeerPool.cs Outdated
DarkLord017 and others added 2 commits June 23, 2026 02:47
Co-authored-by: claude[bot] <209825114+claude[bot]@users.noreply.github.com>
@DarkLord017 DarkLord017 marked this pull request as ready for review June 22, 2026 21:26
@claude

claude Bot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Claude finished @DarkLord017's task in 2m 17s —— View job


Re-Review (Round 5)

  • Read current state of all changed files
  • Check Round 4 open finding (comment on residual TOCTOU)
  • Full review pass
  • Post findings

All Prior Findings Resolved ✅

Round Severity Finding Status
1 🟠 High ShouldContact test breakage ✅ Resolved — RlpxHost changes reverted
1 🟡 Medium IRlpxHost.ShouldContact interface docs stale ✅ Resolved — revert
1 🟡 Medium TOCTOU + non-volatile reads ✅ Resolved — volatile fields + lock(peer) in AttachSession
1 🔵 Low TryAccept return value silently discarded ✅ Resolved — revert
2 🟡 Medium Dead code TryRemove in !TryGetValue branch ✅ Resolved — early return
2–4 🟡 Medium Comment documenting residual TOCTOU not present Resolved in this batch — comment added at PeerPool.cs:84–89

No New Findings

The implementation is now correct and well-documented:

  • volatile backing fields (Peer.cs:33–34) — InSession/OutSession writes are immediately visible across threads. Reads inside lock(peer) in NodeSourceOnNodeRemoved see the latest value written by any thread.
  • lock(peer) in NodeSourceOnNodeRemoved and AttachSession — makes the session check and TryRemove atomic against concurrent session assignment. The two threads compete for the same lock; only one wins and sees null sessions.
  • IsAwaitingConnection guard (PeerPool.cs:92) — prevents eviction in the window between connection initiation and session stamping.
  • Sender type bypass (PeerPool.cs:82) — admin-originated removals (static/trusted managers) still disconnect active peers, which is the correct intentional behaviour.
  • Comment on residual race (PeerPool.cs:84–89) — clearly documents the TryGetValue-outside-lock boundary so future maintainers can reason about it without reintroducing the bug.

Known acknowledged risks (not new, no action needed):

  • IsAwaitingConnection not volatile: consistent with pre-existing codebase patterns; risk mitigated by the lock memory barrier.
  • PeerRemoved fires inside lock(peer) via TryRemove (line 95): current handler in PeerManager only sets IsAwaitingConnection = false and calls ActivePeers.TryRemove — neither reacquires lock(peer), so no deadlock today. Latent risk if new handlers are added.

The fix is logically sound, all review findings have been addressed, and the code is ready to merge.

@DarkLord017

Copy link
Copy Markdown
Contributor Author

Comment thread src/Nethermind/Nethermind.Network/PeerPool.cs
Comment thread src/Nethermind/Nethermind.Network/PeerPool.cs Outdated
Comment thread src/Nethermind/Nethermind.Network/PeerPool.cs Outdated
Comment thread src/Nethermind/Nethermind.Network/PeerPool.cs Outdated
Comment thread src/Nethermind/Nethermind.Network/PeerPool.cs Outdated
Comment thread src/Nethermind/Nethermind.Network/PeerPool.cs Outdated
Co-authored-by: Lukasz Rozmej <lukasz.rozmej@gmail.com>
@DarkLord017

Copy link
Copy Markdown
Contributor Author

@claude review

@claude

claude Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Claude encountered an error after 3m 12s —— View job


I'll analyze this and get back to you.

@DarkLord017

Copy link
Copy Markdown
Contributor Author

@claude try again also pls check if i have added all the correct code as per the comments or requested changes

@claude

claude Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Claude finished @DarkLord017's task in 7m 9s —— View job


Re-Review (Round 6)


All reviewer feedback addressed ✅

Reviewer Concern Status
@LukaszRozmej Sender-type check is fragile — use a subclass of NodeEventArgs instead ✅ Fixed — ExplicitNodeRemovalEventArgs subclass added; PeerPool checks e is not ExplicitNodeRemovalEventArgs
@LukaszRozmej Remove skippable comment ✅ Resolved in refactor
@AnkushinDaniil Add PeerPoolTests cases: session active → peer kept; no session → peer removed PeerPool_DiscoveryEviction_KeepsPeer_WhenSessionActive and PeerPool_DiscoveryEviction_RemovesPeer_WhenNoSession added
@AnkushinDaniil Is ITrustedNodesManager actually bound as INodeSource? ✅ Confirmed — DiscoveryModule.cs:51 has .Bind<INodeSource, ITrustedNodesManager>()
@AnkushinDaniil Locking on a publicly visible Peer instance is exposed — use a private lock object ✅ Fixed — Peer.cs now exposes internal readonly object SessionLock = new();; both NodeSourceOnNodeRemoved and AttachSession lock on peer.SessionLock
@AnkushinDaniil + @Marchhill IsAwaitingConnection not volatile but read from eviction thread ✅ Fixed — private volatile bool _isAwaitingConnection in Peer.cs:37
@AnkushinDaniil PeerRemoved fires while holding the lock ✅ Fixed — PeerRemoved is fired outside lock(peer.SessionLock) using the bool removed flag pattern

Correctness notes (no action needed)

  • TestNodeSource.RemoveNode fires plain NodeEventArgs (not ExplicitNodeRemovalEventArgs), which is correct — it simulates discovery eviction, not admin removal. The two new tests exercise the right code path. ✅
  • Explicit removal path (TryRemove at line 99) runs without SessionLock. This is intentional — admin removal is unconditional by design. A narrow race (session assigned between TryRemove reading and nulling InSession) can leave a session undisconnected, but this is pre-existing behavior and out of scope for this fix.
  • CompositeNodeSource.NodeSourceOnNodeRemoved re-fires with the original sender, preserving the ExplicitNodeRemovalEventArgs subtype so PeerPool correctly distinguishes explicit vs. discovery evictions even through the composite. ✅
  • NodesManager.TryRemoveNode (used by both StaticNodesManager.RemoveAsync and TrustedNodesManager.RemoveAsync) fires ExplicitNodeRemovalEventArgs, so admin-initiated removes correctly bypass the session guard. ✅

The implementation is correct and all feedback from all reviewers has been addressed. Ready to merge.

@DarkLord017

Copy link
Copy Markdown
Contributor Author

@claude final review made some changes

@claude

claude Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Claude encountered an error after 3m 17s —— View job


I'll analyze this and get back to you.

@DarkLord017

Copy link
Copy Markdown
Contributor Author

@claude try again

@claude

claude Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Claude finished @DarkLord017's task in 2m 36s —— View job


Re-Review (Round 7)

  • Load tools
  • Read current state of all changed files
  • Verify all previous findings are addressed
  • Post findings

All Previous Findings Resolved ✅

Round Severity Finding Status
1 🟠 High ShouldContact test breakage ✅ Resolved — RlpxHost changes reverted
1 🟡 Medium IRlpxHost.ShouldContact interface docs stale ✅ Resolved — revert
1 🟡 Medium TOCTOU + non-volatile reads ✅ Resolved — volatile fields + lock(peer.SessionLock) in AttachSession
1 🔵 Low TryAccept return value silently discarded ✅ Resolved — revert
2 🔵 Low Dead code TryRemove in !TryGetValue branch ✅ Resolved — early return
2–4 🟡 Medium Comment documenting residual TOCTOU ✅ Resolved
6 🟡 Medium Sender-type check fragile ✅ Resolved — ExplicitNodeRemovalEventArgs subclass added
6 🟡 Medium No PeerPoolTests for eviction guard ✅ Resolved — PeerPool_DiscoveryEviction parametrized test added
6 🟡 Medium ITrustedNodesManager not bound as INodeSource ✅ Resolved — DiscoveryModule.cs now has .Bind<INodeSource, ITrustedNodesManager>()
6 🟡 Medium lock on publicly visible Peer instance ✅ Resolved — peer.SessionLock is a dedicated internal readonly object
6 🟡 Medium IsAwaitingConnection not volatile ✅ Resolved — private volatile bool _isAwaitingConnection in Peer.cs
6 🟡 Medium PeerRemoved fired inside lock ✅ Resolved — bool removed flag pattern; event fired outside lock

No New Findings

The implementation is correct, well-structured, and ready to merge.

Verification of key correctness properties:

  • volatile backing fields (Peer.cs:37–39) — _isAwaitingConnection, _inSession, _outSession are all volatile. Reads in NodeSourceOnNodeRemoved see the latest value written by any thread. ✅
  • lock(peer.SessionLock) in NodeSourceOnNodeRemoved and AttachSession — session check + TryRemove are atomic against concurrent session assignment. Two threads competing for the same lock: only the winner sees null sessions and proceeds with removal. ✅
  • PeerRemoved fired outside lock — both the discovery eviction path (via bool removed flag, PeerPool.cs:95) and the explicit removal path (TryRemove, line 151) fire the event after releasing SessionLock. No deadlock risk from current handlers. ✅
  • ExplicitNodeRemovalEventArgs preserved through CompositeNodeSourceCompositeNodeSource.NodeSourceOnNodeRemoved re-fires with the original e reference (NodeRemoved?.Invoke(sender, e)), so the subtype is preserved. ✅
  • TestNodeSource.RemoveNode fires plain NodeEventArgs (not ExplicitNodeRemovalEventArgs), correctly simulating discovery eviction in the new test. ✅
  • TryRemove (explicit removal path) acquires lock(peer.SessionLock) before nulling sessions — safe against concurrent reads. ✅
  • IsAwaitingConnection guard prevents eviction in the window between connection initiation and session stamping. ✅

Acknowledged risks (no action needed):

  • TryGetValue in NodeSourceOnNodeRemoved is outside the lock: a peer newly assigned between TryGetValue returning false and lock acquisition is simply not found and not removed — correct behavior.
  • PeerRemoved fires while lock(peer.SessionLock) is not held. Current handlers (PeerPoolOnPeerRemoved sets IsAwaitingConnection = false + ActivePeers.TryRemove) don't reacquire the lock, so no deadlock today. Latent risk if new handlers are added in future.

The fix is logically sound, all reviewer feedback has been addressed, and the implementation is clean and well-tested. Ready to merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants