Skip to content

fix(benchmark): stop MCP startup deadlock — drop script-only runners from the calibration library barrel#2

Merged
cdeust merged 1 commit into
mainfrom
fix/mcp-startup-deadlock-calibration-script-island
Jun 1, 2026
Merged

fix(benchmark): stop MCP startup deadlock — drop script-only runners from the calibration library barrel#2
cdeust merged 1 commit into
mainfrom
fix/mcp-startup-deadlock-calibration-script-island

Conversation

@cdeust
Copy link
Copy Markdown
Owner

@cdeust cdeust commented Jun 1, 2026

Problem

The plugin's MCP server (prd-gen) deadlocks on every launch — it never answers the MCP initialize handshake, so the server fails to come up under Claude Code. Node prints Warning: Detected unsettled top-level await and no evidence.db is created.

Root cause

The calibration library barrel packages/benchmark/calibration/index.ts re-exports the script-only runners next to its library-safe oracle/stats seams:

  • runCalibration, selectModeFromArgv (from ./calibrate-gates.js)
  • runProductionCalibration (from ./calibrate-gates-production.js)
  • runProductionFromCli (from ./calibrate-gates-production-cli.js)

Those modules are script-only — top-level await + FS writes — exactly as their own §2.2 layer-contract banner in calibrate-gates-production.ts documents.

@prd-gen/mcp-server imports this barrel (via build-conclude-opts.ts) only for the library-safe oracle dispatch (invokeOracle / OracleUnavailableError / OracleInput). But esbuild, bundling the server, must initialise every statically-reachable module — so the script island's top-level await runs at server startup and never settles, so server.connect() is never reached.

Authoritative import chain (esbuild metafile):

mcp-server/index → pipeline-tools → build-conclude-opts
  → @prd-gen/benchmark/calibration (barrel) → calibrate-gates-production (top-level-await island)

Fix

Stop the library barrel from re-exporting the script-only runners. They have no barrel consumers — the CLIs and their tests already import them directly from their own modules (e.g. ../calibrate-gates.js), so this is non-breaking. It simply honours the §2.2 script/library boundary those modules already declare. The library-safe oracle/stats seams the server actually uses are untouched.

Net diff: the barrel source + the regenerated committed bundle (mcp-server/index.js). packages/benchmark/dist is gitignored.

Verification

  • esbuild metafile: the script island (calibrate-gates*) is no longer in the server graph.
  • Regenerated bundle: zero top-level await remain in mcp-server/index.js.
  • Runtime: the server now answers initialize immediately —
    {"result":{"protocolVersion":"2024-11-05","capabilities":{"tools":{"listChanged":true}},"serverInfo":{"name":"prd-gen","version":"0.1.0"}},...}
  • Tests: pnpm vitest run packages/benchmark packages/mcp-server339 passed.

Follow-up (not in this PR)

Separately, the published plugin ships no node_modules, and the bundle externalises ajv (+ its dynamic ajv/dist/runtime/* requires) and better-sqlite3, so a fresh claude plugin install can't launch the server until deps are installed/hoisted. Worth addressing in a packaging PR (ship a postinstall, or a flat node_modules, or inline the validator).

🤖 Generated with Claude Code

…from the calibration library barrel

The calibration barrel (packages/benchmark/calibration/index.ts) re-exported
the script-only runners (runCalibration, selectModeFromArgv,
runProductionCalibration, runProductionFromCli) alongside its library-safe
oracle/stats seams. Those runner modules are script-only — top-level await +
FS writes, per the §2.2 layer-contract banner in calibrate-gates-production.ts.

@prd-gen/mcp-server imports this barrel (via build-conclude-opts) only for the
library-safe oracle dispatch (invokeOracle / OracleUnavailableError). But
esbuild, bundling the server, must initialise every statically-reachable
module — so the script island's top-level await was awaited at server startup
and never settled. The MCP `initialize` handshake never completed because
server.connect() was never reached: the plugin's MCP server hung on every launch.

Import chain (esbuild metafile):
  mcp-server/index -> pipeline-tools -> build-conclude-opts
    -> @prd-gen/benchmark/calibration (barrel) -> calibrate-gates-production (TLA island)

Fix: stop the library barrel from re-exporting the script-only runners. They
have no barrel consumers — the CLIs and their tests already import them
directly from their own modules — so this is non-breaking and simply honours
the §2.2 script/library boundary those modules already document. The
library-safe oracle/stats seams the server actually needs are unchanged.

Verification:
- esbuild metafile: script island no longer in the server graph
- regenerated bundle (mcp-server/index.js) has zero top-level await
- MCP server now answers `initialize` immediately (was: deadlock)
- pnpm vitest packages/benchmark packages/mcp-server: 339 passed

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@cdeust cdeust merged commit 73be37e into main Jun 1, 2026
2 checks passed
@cdeust cdeust deleted the fix/mcp-startup-deadlock-calibration-script-island branch June 1, 2026 08:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant