Fix connection snapshot truth#918
Conversation
Make GatewayConnectionManager snapshots the lifecycle source of truth for connection surfaces. Capture node intent/blockers before startup can silently skip, make MCP startup failures visible, and keep legacy status consumers derived from manager state while preserving operator-live behavior. Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
|
Codex review: needs real behavior proof before merge. Reviewed July 2, 2026, 3:11 PM ET / 19:11 UTC. Summary Reproducibility: yes. Source inspection of current main shows operator-connected plus node-idle can still derive a healthy-looking connected state without node intent truth; I did not run the Windows app in this read-only review. Review metrics: 3 noteworthy metrics.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Proof guidance:
Mantis proof suggestion Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Refresh the real behavior proof against current head 3eaea54, including the latest app.settings.set MCP startup-failure behavior, then land after maintainer review of the availability paths and App.xaml.cs overlap. Do we have a high-confidence way to reproduce the issue? Yes. Source inspection of current main shows operator-connected plus node-idle can still derive a healthy-looking connected state without node intent truth; I did not run the Windows app in this read-only review. Is this the best way to solve the issue? Yes as an implementation direction. Making GatewayConnectionManager snapshots the lifecycle source of truth matches the repository architecture; the remaining blocker is current-head proof and maintainer review, not a different code approach. AGENTS.md: found and applied where relevant. Codex review notes: model internal, reasoning high; reviewed against f89a88a6baf4. Label changesLabel changes:
Label justifications:
Evidence reviewedWhat I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
Publish a blocked node snapshot if node connector startup throws before status events can transition the manager. Add a focused regression test for the throwing connector path. Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. |
Publish blocked node snapshots when previous node connector retirement fails. Include manager-owned overall/node fields in MCP app.status and app.menu so MCP clients can see degraded or blocked node truth instead of only legacy connected status. Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
|
@clawsweeper re-review |
|
🦞🧹 I asked ClawSweeper to review this item again. |
|
@clawsweeper re-review |
1 similar comment
|
@clawsweeper re-review |
Keep PR 918 current with main and add a maintainer patch so app.settings.set surfaces MCP startup failures as tool errors while MCP startup notifications are shown or dismissed from a single runtime policy. Co-authored-by: Copilot App <223556219+Copilot@users.noreply.github.com>
Summary
AppState.Status, raw gateway status callbacks, MCP startup state, and UI-specific projections; this allowed false healthy/connected states.GatewayConnectionSnapshot/OverallConnectionStatethe manager-owned lifecycle truth; added node intent/blockers; made MCP startup honest; updated Connection page, top pill, tray/menu/dashboard, Command Center, notifications, and MCP app status/settings behavior to consume derived snapshot semantics.ConnectionTruthmodel; deeper removal of legacyConnectionStatusread-side consumers is intentionally deferred to a follow-up cleanup PR.Change Type (select all)
Scope (select all touched areas)
winnodeLinked Issue/PR
Validation
./build.ps1— passeddotnet test .\tests\OpenClaw.Shared.Tests\OpenClaw.Shared.Tests.csproj --no-restore— passed: 2686 passed, 31 skippeddotnet test .\tests\OpenClaw.Tray.Tests\OpenClaw.Tray.Tests.csproj --no-restore— passed: 1488 passeddotnet test .\tests\OpenClaw.Connection.Tests\OpenClaw.Connection.Tests.csproj --no-restore— passed: 395 passeddotnet test .\tests\OpenClaw.WinNode.Cli.Tests\OpenClaw.WinNode.Cli.Tests.csproj --no-restore— passed: 120 passedNotes:
OpenClaw.WinNode.Cli.Testsneeded one initial restore/no-restore warmup in this fresh worktree; the final--no-restorerun passed.AssistantBridgeServiceTests.StartListenServiceAsync_KillsTimedOutBackendCommand, an ExecApprovals runtime proof, and a token-recovery wait) failed once and passed on rerun. The final validation set above passed.Real behavior proof
%APPDATA%\OpenClawTrayprofile, local OpenClawGateway viaws://localhost:18789, PR branchbkudiess-connection-snapshot-truth.1efcfa19C:\Projects\copilot-worktrees\openclaw-windows-node\bkudiess-solid-fishstick\src\OpenClaw.Tray.WinUI\bin\Debug\net10.0-windows10.0.22621.0\win-arm64\OpenClaw.Tray.WinUI.exe.http://127.0.0.1:8765/with bearer token from%APPDATA%\OpenClawTray\mcp-token.txt.tools/list,tools/call app.status,tools/call app.menu, andtools/call app.nodeson the current PR head.tools/listreturned 48 tools and includedapp.status,app.menu,app.connection.reconnect,app.connection.reconnectNode, andapp.settings.set.app.statusoutput includes manager-owned state fields:{ "connectionStatus": "Connected", "overallState": "Ready", "operatorState": "Connected", "nodeState": "Connected", "nodeConnected": true, "nodePaired": true, "nodePendingApproval": false, "nodeError": null, "gatewayVersion": "2026.6.10", "sessionCount": 0, "nodeCount": 1, "operatorScopes": ["operator.admin", "operator.pairing", "operator.read", "operator.write"], "operatorDeviceId": null }app.menustatus item includes manager-owned state fields:{ "type": "status", "status": "Connected", "overallState": "Ready", "nodeState": "Connected", "nodeError": null }app.nodesreturned online Windows nodeWindows Node (PERSEID)withCapabilityCount: 8andCommandCount: 28.app.settings.set { "name": "EnableNodeMode", "value": "false" }applies the settings lifecycle and changesapp.statustonodeConnected:false/nodePaired:false; re-enabling Node mode plusapp.connection.reconnectNoderestoresnodeConnected:true/nodePaired:true.overallState,operatorState,nodeState, andnodeErrorfields instead of only legacyconnectionStatus.Yes/No/N/A): N/A.OpenClaw.Tray.WinUIand the host cannot show an approval prompt. Runtime proof uses raw MCP JSON-RPC plus real-profile gateway/node status. I did not intentionally induce a real degraded/blocked node state in the user profile to avoid corrupting paired credentials; blocked/degraded node paths are covered by focused manager tests (HandshakeSucceeded_NodeConnectorThrows_ReportsBlockedNode,HandshakeSucceeded_PreviousNodeDisconnectThrows_ReportsBlockedNode, missing credential/connector/record tests, and projection/source-contract tests).Rubber-duck / review notes:
app.status/app.menu: stale node intent after disabling Node mode, stale node errors masking pairing, false Ready snapshot before node blocker, generation-guarded node blockers, Command Center status precedence, top pill refresh on every manager snapshot, MCP settings lifecycle, and MCPapp.statussnapshot source.src/OpenClaw.Tray.WinUI/App.xaml.csandtests/OpenClaw.Tray.Tests/AppRefactorContractTests.cs; synthetic merge with Redesign WSL gateway setup and OpenClaw onboard UX #792 had 0 conflict markers.Security Impact (required)
Yes/No): No new capability permissions. Existing MCP app settings behavior now applies the same reconnect/reload side effects as UI settings save.Yes/No): No.Yes/No): No new external network calls. Local MCP HTTP behavior/documentation updated.Yes/No): Yes, existingapp.settings.setnow applies settings lifecycle after persisting; this makes MCP behavior match UI settings save instead of silently writing config only. Existingapp.status/app.menuoutputs now include explicit manager-owned state fields.Yes/No): No.Yes, explain risk + mitigation:app.settings.setalready existed and was allowlisted to safe non-secret settings only. Applying the normal settings lifecycle is expected behavior and is covered by source contract tests plus real MCP proof. Addedapp.status/app.menufields expose connection state already visible in the app; no secrets are included.Compatibility / Migration
Yes/No): Yes.Yes/No): No.Yes/No): No.Review Conversations
No inline bot review conversations exist yet; ClawSweeper feedback has been addressed in branch commits and PR-body proof updates.