WIP

cameroncooke · cameroncooke · commit 7116ee1380e3 · 2026-02-24T13:10:33.000Z
diff --git a/AGENTS.md b/AGENTS.md
@@ -51,3 +51,4 @@ Use these sections under `## [Unreleased]`:
 ## **CRITICAL** Tool Usage Rules **CRITICAL**
 - NEVER use sed/cat to read a file or a range of a file. Always use the native read tool.
 - You MUST read every file you modify in full before editing.
+- If using XcodeBuildMCP, first find and read the installed XcodeBuildMCP skill before calling XcodeBuildMCP tools.
diff --git a/MCP_EVAL_TOKEN_ANALYSIS_SUMMARY_2026-02-24.md b/MCP_EVAL_TOKEN_ANALYSIS_SUMMARY_2026-02-24.md
@@ -0,0 +1,166 @@
+# XcodeBuildMCP Eval Deep Dive — Token Usage, Performance, and Next Steps
+
+Date: February 24, 2026
+Author: Codex assistant (session with Cam)
+Run analyzed: `/Users/cameroncooke/Developer/mcp_evals/runs/20260120_225600`
+
+## Executive Summary
+
+The large token delta reported for XcodeBuildMCP vs `shell_primed` is real, but its root cause is often misinterpreted.
+
+- The largest deltas are dominated by **cached input tokens** (replayed context across turns), not uncached “new reasoning” tokens.
+- This behavior is **not unique to XcodeBuildMCP**. It is a general multi-turn/tool-use property.
+- In this eval setup, XcodeBuildMCP incurs extra overhead due to:
+  1. More setup/discovery turns,
+  2. Larger tool surface/context,
+  3. More replay cycles of prior context.
+- Therefore, the reported “+100k tokens” is mostly a **session-level usage accounting effect**, not direct evidence that the model exhausted context window capacity.
+
+That said, XcodeBuildMCP still underperformed `shell_primed` on wall time and error rates in this dataset, so there are real workflow optimization opportunities.
+
+---
+
+## What We Verified in the Data
+
+### 1) Token delta exists across agents, not just one
+
+Comparing `mcp_unprimed_v2` vs `shell_primed` (mean per run, non-baseline):
+
+- **Codex**: total input `+120,793` tokens
+  - uncached `-8,453`
+  - cached `+129,246`
+- **Claude Opus**: total input `+84,040`
+  - uncached `+111`
+  - cached `+83,929`
+- **Claude Sonnet**: total input `+132,568`
+  - uncached `+228`
+  - cached `+132,340`
+
+Interpretation: the dominant effect is replayed cached context.
+
+### 2) MCP path has more setup/tool turns
+
+For MCP scenarios, a large percentage of calls are setup/discovery:
+- `session-set-defaults`, `list_schemes`, `list_sims`, `discover_projs` account for most MCP calls.
+- This increases turn count before build/test execution starts.
+
+### 3) More turns + bigger replayed context => large cumulative usage
+
+This is the key mechanics point:
+- Context window occupancy at a single turn is not “compounded”.
+- But **total token usage across session is compounded** because each turn reprocesses large prior prefixes (mostly cached).
+
+So a 4k–10k larger reusable prefix can translate to very large cumulative token deltas over many turns.
+
+### 4) Performance and reliability gaps vs `shell_primed`
+
+Across agents, `mcp_unprimed_v2` generally trails `shell_primed` on:
+- wall time,
+- tool errors,
+- time-to-first-build.
+
+Success rates are often comparable, but MCP gets there with more overhead in this run configuration.
+
+---
+
+## Clarification: Token Usage vs Context Window Usage
+
+A key confusion to correct in reporting:
+
+- **High cached token usage does not automatically mean context window was “wasted” or exhausted.**
+- Cached tokens are primarily an inference accounting/cost metric across turns.
+- Context pressure is a separate per-turn issue (depends on what is currently in-window, truncation/summarization, etc.).
+
+This distinction should be explicit in future writeups.
+
+---
+
+## Why the Current Comparison Is Not Fully Fair
+
+`shell_primed` is given deterministic build parameters in prompt.
+
+`mcp_unprimed` / `mcp_unprimed_v2` are not equivalently pre-seeded; they discover and set defaults during the run.
+
+That asymmetry structurally biases MCP toward extra turns and replay overhead.
+
+---
+
+## Planned Next Eval (Recommended)
+
+Add a new scenario:
+
+## `mcp_persisted_defaults`
+
+Use production XcodeBuildMCP session-default persistence and pre-seed defaults prior to trial start.
+
+Suggested scenario matrix:
+1. `shell_primed`
+2. `mcp_unprimed_v2`
+3. `mcp_persisted_defaults` (new)
+
+Report these metrics prominently:
+- success rate,
+- wall time,
+- tool errors,
+- time-to-first-build,
+- MCP call mix (setup vs execution),
+- uncached input tokens,
+- cached input tokens,
+- billed cost,
+- cold-equivalent cost.
+
+Primary efficiency metric for cross-agent/model comparisons should be **uncached input tokens** (plus wall time), with cached totals clearly labeled as replay/accounting heavy.
+
+---
+
+## Opportunities to Improve XcodeBuildMCP
+
+1. **Collapse setup turns**
+   - Provide/encourage a single bootstrap call (discover + defaults + selected target/sim) where possible.
+
+2. **Lean response mode for agent workflows**
+   - Reduce verbose “next steps” / repeated boilerplate in tool outputs for eval/agent mode.
+
+3. **Task-scoped tool exposure**
+   - Reduce visible tool surface when task scope is known (fewer irrelevant tools).
+
+4. **Persisted defaults first-class UX**
+   - Make persisted defaults the default recommended flow for repeat sessions and eval harnesses.
+
+5. **Error payload quality**
+   - Continue improving MCP error clarity to reduce retry churn and malformed follow-up calls.
+
+6. **Deterministic fast path docs/prompts**
+   - Publish a concise “low-turn recipe” for common build/test/install/launch flows to reduce exploratory calls.
+
+---
+
+## Shortcomings to Address in the Blog Post
+
+1. **Conflating cumulative token usage with context-window pressure**
+   - Clarify that high cached totals are mostly replay accounting across turns.
+
+2. **Understating scenario asymmetry**
+   - Explicitly call out that `shell_primed` had stronger deterministic priming than MCP scenarios.
+
+3. **Insufficient emphasis on uncached vs cached split**
+   - Show uncached deltas separately; this changes interpretation of “waste”.
+
+4. **Need stronger caveat about provider/accounting differences**
+   - Keep billed vs cold-equivalent vs uncached separated and explained.
+
+5. **Frame result as workflow/tooling optimization target, not MCP category verdict**
+   - Current result is about this tool surface + run shape + harness design, not all MCP usage.
+
+---
+
+## Final Position
+
+The current data does not support “MCP inherently wastes context window.”
+
+It does support:
+- MCP workflow shape in this eval caused more turns and larger replayed context,
+- this inflated cumulative token usage (mostly cached),
+- and increased wall-time/error overhead versus a strongly primed shell baseline.
+
+The next decisive test is `mcp_persisted_defaults` under matched priming conditions.
diff --git a/docs/TOOLS-CLI.md b/docs/TOOLS-CLI.md
@@ -18,7 +18,7 @@ XcodeBuildMCP provides 73 canonical tools organized into 13 workflow groups.
 
 - `build` - Build for device.
 - `clean` - Clean build products.
-- `discover-projects` - Scans a directory (defaults to workspace root) to find Xcode project (.xcodeproj) and workspace (.xcworkspace) files.
+- `discover-projects` - Scans a directory (defaults to workspace root) to find Xcode project (.xcodeproj) and workspace (.xcworkspace) files. Use when project/workspace path is unknown.
 - `get-app-bundle-id` - Extract bundle id from .app.
 - `get-app-path` - Get device built app path.
 - `install` - Install app on device.
@@ -38,7 +38,7 @@ XcodeBuildMCP provides 73 canonical tools organized into 13 workflow groups.
 
 - `boot` - Defined in Simulator Management workflow.
 - `build` - Build for iOS sim (compile-only, no launch).
-- `build-and-run` - Build and run iOS sim (preferred for run/launch intent).
+- `build-and-run` - Build, install, and launch on iOS Simulator; boots simulator and attempts to open Simulator.app as needed. Preferred single-step run tool when defaults are set.
 - `clean` - Defined in iOS Device Development workflow.
 - `discover-projects` - Defined in iOS Device Development workflow.
 - `get-app-bundle-id` - Defined in iOS Device Development workflow.
@@ -130,10 +130,10 @@ XcodeBuildMCP provides 73 canonical tools organized into 13 workflow groups.
 ### Simulator Management (`simulator-management`)
 **Purpose**: Tools for managing simulators from booting, opening simulators, listing simulators, stopping simulators, erasing simulator content and settings, and setting simulator environment options like location, network, statusbar and appearance. (8 tools)
 
-- `boot` - Boot iOS simulator.
+- `boot` - Boot iOS simulator for manual/non-build flows. Not required before simulator build-and-run (build_run_sim).
 - `erase` - Erase simulator.
 - `list` - List iOS simulators.
-- `open` - Open Simulator app.
+- `open` - Open Simulator.app for visibility/manual workflows. Not required before simulator build-and-run (build_run_sim).
 - `reset-location` - Reset sim location.
 - `set-appearance` - Set sim appearance.
 - `set-location` - Set sim location.
@@ -189,4 +189,4 @@ XcodeBuildMCP provides 73 canonical tools organized into 13 workflow groups.
 
 ---
 
-*This documentation is automatically generated by `scripts/update-tools-docs.ts` from the tools manifest. Last updated: 2026-02-22T18:16:55.247Z UTC*
+*This documentation is automatically generated by `scripts/update-tools-docs.ts` from the tools manifest. Last updated: 2026-02-24T11:32:32.907Z UTC*
diff --git a/docs/TOOLS.md b/docs/TOOLS.md
@@ -16,7 +16,7 @@ This document lists MCP tool names as exposed to MCP clients. XcodeBuildMCP prov
 
 - `build_device` - Build for device.
 - `clean` - Clean build products.
-- `discover_projs` - Scans a directory (defaults to workspace root) to find Xcode project (.xcodeproj) and workspace (.xcworkspace) files.
+- `discover_projs` - Scans a directory (defaults to workspace root) to find Xcode project (.xcodeproj) and workspace (.xcworkspace) files. Use when project/workspace path is unknown.
 - `get_app_bundle_id` - Extract bundle id from .app.
 - `get_device_app_path` - Get device built app path.
 - `install_app_device` - Install app on device.
@@ -35,7 +35,7 @@ This document lists MCP tool names as exposed to MCP clients. XcodeBuildMCP prov
 **Purpose**: Complete iOS development workflow for both .xcodeproj and .xcworkspace files targeting simulators. (21 tools)
 
 - `boot_sim` - Defined in Simulator Management workflow.
-- `build_run_sim` - Build and run iOS sim (preferred for run/launch intent).
+- `build_run_sim` - Build, install, and launch on iOS Simulator; boots simulator and attempts to open Simulator.app as needed. Preferred single-step run tool when defaults are set.
 - `build_sim` - Build for iOS sim (compile-only, no launch).
 - `clean` - Defined in iOS Device Development workflow.
 - `discover_projs` - Defined in iOS Device Development workflow.
@@ -130,7 +130,7 @@ This document lists MCP tool names as exposed to MCP clients. XcodeBuildMCP prov
 
 - `session_clear_defaults` - Clear session defaults for the active profile or a specified profile.
 - `session_set_defaults` - Set session defaults for the active profile, or for a specified profile and make it active.
-- `session_show_defaults` - Show the current active defaults.
+- `session_show_defaults` - Show current active defaults (use when defaults are unknown or need verification).
 - `session_use_defaults_profile` - Switch the active session defaults profile.
 - `sync_xcode_defaults` - Sync session defaults (scheme, simulator) from Xcode's current IDE selection.
 
@@ -139,10 +139,10 @@ This document lists MCP tool names as exposed to MCP clients. XcodeBuildMCP prov
 ### Simulator Management (`simulator-management`)
 **Purpose**: Tools for managing simulators from booting, opening simulators, listing simulators, stopping simulators, erasing simulator content and settings, and setting simulator environment options like location, network, statusbar and appearance. (8 tools)
 
-- `boot_sim` - Boot iOS simulator.
+- `boot_sim` - Boot iOS simulator for manual/non-build flows. Not required before simulator build-and-run (build_run_sim).
 - `erase_sims` - Erase simulator.
 - `list_sims` - List iOS simulators.
-- `open_sim` - Open Simulator app.
+- `open_sim` - Open Simulator.app for visibility/manual workflows. Not required before simulator build-and-run (build_run_sim).
 - `reset_sim_location` - Reset sim location.
 - `set_sim_appearance` - Set sim appearance.
 - `set_sim_location` - Set sim location.
@@ -205,4 +205,4 @@ This document lists MCP tool names as exposed to MCP clients. XcodeBuildMCP prov
 
 ---
 
-*This documentation is automatically generated by `scripts/update-tools-docs.ts` from the tools manifest. Last updated: 2026-02-22T18:16:55.247Z UTC*
+*This documentation is automatically generated by `scripts/update-tools-docs.ts` from the tools manifest. Last updated: 2026-02-24T11:32:32.907Z UTC*
diff --git a/manifests/tools/boot_sim.yaml b/manifests/tools/boot_sim.yaml
@@ -3,7 +3,7 @@ module: mcp/tools/simulator/boot_sim
 names:
   mcp: boot_sim
   cli: boot
-description: Boot iOS simulator.
+description: Boot iOS simulator for manual/non-build flows. Not required before simulator build-and-run (build_run_sim).
 annotations:
   title: Boot Simulator
   destructiveHint: true
diff --git a/manifests/tools/build_run_sim.yaml b/manifests/tools/build_run_sim.yaml
@@ -3,7 +3,7 @@ module: mcp/tools/simulator/build_run_sim
 names:
   mcp: build_run_sim
   cli: build-and-run
-description: Build and run iOS sim (preferred for run/launch intent).
+description: Build, install, and launch on iOS Simulator; boots simulator and attempts to open Simulator.app as needed. Preferred single-step run tool when defaults are set.
 predicates:
   - hideWhenXcodeAgentMode
 annotations:
diff --git a/manifests/tools/discover_projs.yaml b/manifests/tools/discover_projs.yaml
@@ -3,7 +3,14 @@ module: mcp/tools/project-discovery/discover_projs
 names:
   mcp: discover_projs
   cli: discover-projects
-description: Scans a directory (defaults to workspace root) to find Xcode project (.xcodeproj) and workspace (.xcworkspace) files.
+description: Scans a directory (defaults to workspace root) to find Xcode project (.xcodeproj) and workspace (.xcworkspace) files. Use when project/workspace path is unknown.
 annotations:
   title: Discover Projects
   readOnlyHint: true
+nextSteps:
+  - label: Save discovered project/workspace as session defaults
+    toolId: session_set_defaults
+    priority: 1
+  - label: Build and run once defaults are set
+    toolId: build_run_sim
+    priority: 2
diff --git a/manifests/tools/open_sim.yaml b/manifests/tools/open_sim.yaml
@@ -3,12 +3,12 @@ module: mcp/tools/simulator/open_sim
 names:
   mcp: open_sim
   cli: open
-description: Open Simulator app.
+description: Open Simulator.app for visibility/manual workflows. Not required before simulator build-and-run (build_run_sim).
 annotations:
   title: Open Simulator
   destructiveHint: true
 nextSteps:
-  - label: Boot a simulator if needed
+  - label: Boot a simulator for manual workflows
     toolId: boot_sim
     params:
       simulatorId: UUID_FROM_LIST_SIMS
diff --git a/manifests/tools/session_show_defaults.yaml b/manifests/tools/session_show_defaults.yaml
@@ -3,7 +3,7 @@ module: mcp/tools/session-management/session_show_defaults
 names:
   mcp: session_show_defaults
   cli: show-defaults
-description: Show the current active defaults.
+description: Show current active defaults (use when defaults are unknown or need verification).
 annotations:
   title: Show Session Defaults
   readOnlyHint: true
diff --git a/skills/xcodebuildmcp-cli/SKILL.md b/skills/xcodebuildmcp-cli/SKILL.md
@@ -33,11 +33,11 @@ Notes:
 
 ### Build And Run On Simulator
 
-If your intent is to run the app in Simulator, use `build-and-run` directly. It already performs the build step.
+If your intent is to run the app in Simulator, use `build-and-run` directly. It already performs the build step and handles simulator boot/open behavior.
 Do not run `build` first unless the user explicitly requests both commands.
-
-1. List simulators and pick a device name or UDID.
-2. Build and run.
+If defaults already include project/workspace + scheme + simulator, call `build-and-run` directly.
+Only run discovery commands when project/workspace details are unknown after checking defaults.
+Never run project discovery speculatively or in parallel with `show-defaults`.
 
 If app and project details are not known:
 ```bash
@@ -154,7 +154,7 @@ To see all SwiftPM tools, view SwiftPM help:
 xcodebuildmcp swift-package --help
 ```
 
-### Project Discovery
+### Project Discovery (only when project/workspace is unknown)
 
 ```bash
 xcodebuildmcp project-discovery discover-projects --workspace-root .
diff --git a/skills/xcodebuildmcp/SKILL.md b/skills/xcodebuildmcp/SKILL.md
@@ -11,7 +11,11 @@ If a capability is missing, assume your tool list may be hiding tools (search/pr
 
 ## Default Tool Choice (Simulator)
 
-- If intent includes run/launch/open in Simulator, use `build_run_sim` as the default.
+- If intent includes running or launching the app in Simulator, use `build_run_sim` as the default.
+- If defaults already include project/workspace + scheme + simulator, call `build_run_sim` directly (often with empty arguments).
+- Call `discover_projs` only when project/workspace is unknown after checking defaults.
+- Never call `discover_projs` speculatively or in parallel with `session_show_defaults`.
+- Do not call `boot_sim` or `open_sim` as preflight for `build_run_sim`; use them for manual/non-build flows only.
 - If intent is compile-only feedback (no launch), use `build_sim`.
 - Do not call `build_sim` and then `build_run_sim` in sequence unless the user explicitly asks for both.
 - If the app is already built and you need launch only without rebuilding, use `install_app_sim` + `launch_app_sim` (or `launch_app_logs_sim`).
@@ -20,7 +24,7 @@ If a capability is missing, assume your tool list may be hiding tools (search/pr
 
 ### Session defaults
 
-Before you call any other tools, you **must** call `session_show_defaults` to show the current defaults, then fill in any missing defaults. You may need discovery/list tools first to obtain valid values.
+Use `session_show_defaults` when defaults are unknown (new session/context switch) or when you need to verify active values. Do not treat it as a mandatory first call when defaults are already known.
 
 - `session_show_defaults`
   - Show the current active defaults (including the active profile name).
@@ -47,11 +51,11 @@ Before you call any other tools, you **must** call `session_show_defaults` to sh
 ### Simulator
 
 - `boot_sim`
-  - Boot iOS simulator.
+  - Boot iOS simulator for manual/non-build flows; not required before `build_run_sim`.
 - `list_sims`
   - List iOS simulators.
 - `open_sim`
-  - Open Simulator app.
+  - Open Simulator app for visibility/manual workflows; not required before `build_run_sim`.
 - `build_sim`
   - Build for iOS sim.
 - `build_run_sim`
diff --git a/src/cli/commands/__tests__/init.test.ts b/src/cli/commands/__tests__/init.test.ts
diff --git a/src/cli/commands/init.ts b/src/cli/commands/init.ts
diff --git a/src/cli/yargs-app.ts b/src/cli/yargs-app.ts
diff --git a/src/server/server.ts b/src/server/server.ts