Fix fatal crash when Bonjour connection drops mid-session#166
Open
madisonrickert wants to merge 1 commit intomattt:mainfrom
Open
Fix fatal crash when Bonjour connection drops mid-session#166madisonrickert wants to merge 1 commit intomattt:mainfrom
madisonrickert wants to merge 1 commit intomattt:mainfrom
Conversation
`MCPService.run()` returns normally when the Bonjour connection to the
menubar app closes (`StdioProxyError.connectionClosed`, or `NWError` 54 /
57). In swift-service-lifecycle 2.x, the default
`ServiceConfiguration.successTerminationBehavior` is `.cancelGroup`,
which treats any `Service.run()` that returns as "finished unexpectedly"
and raises `ServiceGroupError`. That error propagates through
`lifecycle.run()` to top-level, producing:
Swift/ErrorType.swift:254: Fatal error: Error raised at top level:
ServiceGroupError: errorCode: A service has finished unexpectedly.
Observed 15 times in one week of Claude Desktop MCP logs. The process
dies with SIGTRAP-style termination instead of exit code 0, which defeats
supervisor-friendly restarts and contributes to zombie `imcp-server`
processes accumulating.
Fix:
- Configure both `successTerminationBehavior` and
`failureTerminationBehavior` as `.gracefullyShutdownGroup` so
`MCPService` returning from `run()` cleanly shuts the group down.
- Add `gracefulShutdownSignals: [.sigint, .sigterm]` — previously `[]`,
so SIGINT / SIGTERM were silently ignored.
- Observe graceful shutdown in the reconnect loop via
`while !Task.isShuttingDownGracefully` so external signals exit the
loop promptly instead of continuing to retry.
- Remove the dead `MCPService.shutdown()` method; the `Service` protocol
in swift-service-lifecycle 2.x declares only `run()`, so that method
was never called.
Adds a new `imcp-serverTests` XCTest target with three regression tests:
one that asserts the fixed configuration exits cleanly when a service
returns, one that documents the pre-fix `.cancelGroup` failure mode, and
one that exercises `Task.isShuttingDownGracefully` observation.
Verified manually:
- SIGTERM: clean exit code 0, log shows `Signal caught. Shutting down the
group.`
- Bonjour disconnect: clean exit code 0, log line changed from `Service
finished unexpectedly. Cancelling group.` to `Service finished.
Gracefully shutting down group.`
- No orphaned processes after repeated spawns.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
414d2fa to
17aeb46
Compare
|
Hello, I've applied this one together with #162 and it's running much much better and stable now. Previously fast / parallel access to MCP could crash iMCP, this is no longer the case. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
imcp-servercrashes with a Swift top-level fatal error whenever the Bonjour link to the menubar app drops mid-session. Observed 15 times in one week of Claude Desktop MCP logs on a single machine. Every recurrence produces:The process dies with SIGTRAP-style termination instead of exiting cleanly, which:
imcp-serverprocesses accumulating (related to imcp-server process persists after Claude termination when iMCP is not operational #28).Root cause
MCPService.run()handlesStdioProxyError.connectionClosedandNWErrorcodes 54/57 byreturn-ing. Inswift-service-lifecycle2.x, the defaultServiceConfiguration.successTerminationBehavioris.cancelGroup— meaning aService.run()that returns normally is treated as "finished unexpectedly" and the group raisesServiceGroupError.gracefulShutdownSignals: [](the default) also means SIGINT/SIGTERM are silently ignored today.Fix
successTerminationBehavior: .gracefullyShutdownGroup—returning fromrun()shuts the group down cleanly;lifecycle.run()completes without throwing; process exits with status 0.failureTerminationBehavior: .gracefullyShutdownGroup— same for thrown errors, since there's nothing another layer can recover.gracefulShutdownSignals: [.sigint, .sigterm]— signals now trigger graceful shutdown.while !Task.isShuttingDownGracefullyat the top of the reconnect loop — shutdown is observable between iterations so signals don't wait on the 5s retry sleep.MCPService.shutdown()method. TheServiceprotocol in swift-service-lifecycle 2.x declares onlyrun(), so that method was never called.Test plan
New
imcp-serverTestsXCTest target with three regression tests:testServiceReturningNormallyExitsGroupCleanly— the primary regression guard.testDefaultCancelGroupThrowsWhenServiceReturns— pins the pre-fix failure mode so the library's default behavior changing is detectable.testTaskIsShuttingDownGracefullyObservedAfterTrigger— validates the loop's shutdown observation viaServiceLifecycleTestKit.Run:
xcodebuild -project iMCP.xcodeproj -scheme imcp-serverTests test— 3/3 pass.Manual verification:
[ServiceLifecycle] Signal caught. Shutting down the group.Service finished unexpectedly. Cancelling group.toService finished. Gracefully shutting down group.initialize→tools/list→ stdin close round-trip exits cleanly.imcp-serverprocesses after repeated spawns.Related
NWConnectionleak still happens, but the CLI no longer fatal-errors when the peer goes unhealthy.imcp-serverprocesses after Claude Desktop termination. Those accumulate because the process exits via fatal error rather than cleanly. A process-table check after applying this change shows zero orphans.Scope / non-goals
Deliberately out of scope — each is a candidate follow-up:
withCheckedThrowingContinuationinMCPService.run()). Worst-case signal-to-exit time when stuck in that state remains ~30s. Relevant to "Could not attach to MCP server iMCP" timeout #24.notifications/tools/list_changedin some logs. Menubar-side; unrelated to this CLI fix.MCPService.run().Independent of #162 (SDK 0.12.0 bump); either can merge first.