Skip to content

Fix fatal crash when Bonjour connection drops mid-session#166

Open
madisonrickert wants to merge 1 commit intomattt:mainfrom
madisonrickert:fix/servicegroup-graceful-shutdown
Open

Fix fatal crash when Bonjour connection drops mid-session#166
madisonrickert wants to merge 1 commit intomattt:mainfrom
madisonrickert:fix/servicegroup-graceful-shutdown

Conversation

@madisonrickert
Copy link
Copy Markdown

Summary

imcp-server crashes with a Swift top-level fatal error whenever the Bonjour link to the menubar app drops mid-session. Observed 15 times in one week of Claude Desktop MCP logs on a single machine. Every recurrence produces:

critical: Connection closed, terminating...
debug: [ServiceLifecycle] Service finished unexpectedly. Cancelling group.
Swift/ErrorType.swift:254: Fatal error: Error raised at top level:
  ServiceGroupError: errorCode: A service has finished unexpectedly.

The process dies with SIGTRAP-style termination instead of exiting cleanly, which:

Root cause

MCPService.run() handles StdioProxyError.connectionClosed and NWError codes 54/57 by return-ing. In swift-service-lifecycle 2.x, the default ServiceConfiguration.successTerminationBehavior is .cancelGroup — meaning a Service.run() that returns normally is treated as "finished unexpectedly" and the group raises ServiceGroupError. gracefulShutdownSignals: [] (the default) also means SIGINT/SIGTERM are silently ignored today.

Fix

  • successTerminationBehavior: .gracefullyShutdownGroupreturning from run() shuts the group down cleanly; lifecycle.run() completes without throwing; process exits with status 0.
  • failureTerminationBehavior: .gracefullyShutdownGroup — same for thrown errors, since there's nothing another layer can recover.
  • gracefulShutdownSignals: [.sigint, .sigterm] — signals now trigger graceful shutdown.
  • while !Task.isShuttingDownGracefully at the top of the reconnect loop — shutdown is observable between iterations so signals don't wait on the 5s retry sleep.
  • Remove the dead MCPService.shutdown() method. The Service protocol in swift-service-lifecycle 2.x declares only run(), so that method was never called.

Test plan

New imcp-serverTests XCTest target with three regression tests:

  • testServiceReturningNormallyExitsGroupCleanly — the primary regression guard.
  • testDefaultCancelGroupThrowsWhenServiceReturns — pins the pre-fix failure mode so the library's default behavior changing is detectable.
  • testTaskIsShuttingDownGracefullyObservedAfterTrigger — validates the loop's shutdown observation via ServiceLifecycleTestKit.

Run: xcodebuild -project iMCP.xcodeproj -scheme imcp-serverTests test — 3/3 pass.

Manual verification:

  • SIGTERM → exit code 0, [ServiceLifecycle] Signal caught. Shutting down the group.
  • Bonjour peer disappears → exit code 0, log line changed from Service finished unexpectedly. Cancelling group. to Service finished. Gracefully shutting down group.
  • Full initializetools/list → stdin close round-trip exits cleanly.
  • No orphan imcp-server processes after repeated spawns.

Related

Scope / non-goals

Deliberately out of scope — each is a candidate follow-up:

  • The non-cancellable 30s Bonjour discovery (withCheckedThrowingContinuation in MCPService.run()). Worst-case signal-to-exit time when stuck in that state remains ~30s. Relevant to "Could not attach to MCP server iMCP" timeout #24.
  • Why the menubar app drops the Bonjour connection after notifications/tools/list_changed in some logs. Menubar-side; unrelated to this CLI fix.
  • Pre-existing per-iteration browser/proxy cleanup leaks in MCPService.run().

Independent of #162 (SDK 0.12.0 bump); either can merge first.

`MCPService.run()` returns normally when the Bonjour connection to the
menubar app closes (`StdioProxyError.connectionClosed`, or `NWError` 54 /
57). In swift-service-lifecycle 2.x, the default
`ServiceConfiguration.successTerminationBehavior` is `.cancelGroup`,
which treats any `Service.run()` that returns as "finished unexpectedly"
and raises `ServiceGroupError`. That error propagates through
`lifecycle.run()` to top-level, producing:

    Swift/ErrorType.swift:254: Fatal error: Error raised at top level:
      ServiceGroupError: errorCode: A service has finished unexpectedly.

Observed 15 times in one week of Claude Desktop MCP logs. The process
dies with SIGTRAP-style termination instead of exit code 0, which defeats
supervisor-friendly restarts and contributes to zombie `imcp-server`
processes accumulating.

Fix:

- Configure both `successTerminationBehavior` and
  `failureTerminationBehavior` as `.gracefullyShutdownGroup` so
  `MCPService` returning from `run()` cleanly shuts the group down.
- Add `gracefulShutdownSignals: [.sigint, .sigterm]` — previously `[]`,
  so SIGINT / SIGTERM were silently ignored.
- Observe graceful shutdown in the reconnect loop via
  `while !Task.isShuttingDownGracefully` so external signals exit the
  loop promptly instead of continuing to retry.
- Remove the dead `MCPService.shutdown()` method; the `Service` protocol
  in swift-service-lifecycle 2.x declares only `run()`, so that method
  was never called.

Adds a new `imcp-serverTests` XCTest target with three regression tests:
one that asserts the fixed configuration exits cleanly when a service
returns, one that documents the pre-fix `.cancelGroup` failure mode, and
one that exercises `Task.isShuttingDownGracefully` observation.

Verified manually:
- SIGTERM: clean exit code 0, log shows `Signal caught. Shutting down the
  group.`
- Bonjour disconnect: clean exit code 0, log line changed from `Service
  finished unexpectedly. Cancelling group.` to `Service finished.
  Gracefully shutting down group.`
- No orphaned processes after repeated spawns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@madisonrickert madisonrickert force-pushed the fix/servicegroup-graceful-shutdown branch from 414d2fa to 17aeb46 Compare April 23, 2026 19:45
@LeDominik
Copy link
Copy Markdown

Hello, I've applied this one together with #162 and it's running much much better and stable now. Previously fast / parallel access to MCP could crash iMCP, this is no longer the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Very frequent crashing. imcp-server process persists after Claude termination when iMCP is not operational

2 participants