Skip to content

Commit 7116ee1

Browse files
committed
WIP
1 parent 1bcc988 commit 7116ee1

15 files changed

Lines changed: 361 additions & 29 deletions

File tree

AGENTS.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,3 +51,4 @@ Use these sections under `## [Unreleased]`:
5151
## **CRITICAL** Tool Usage Rules **CRITICAL**
5252
- NEVER use sed/cat to read a file or a range of a file. Always use the native read tool.
5353
- You MUST read every file you modify in full before editing.
54+
- If using XcodeBuildMCP, first find and read the installed XcodeBuildMCP skill before calling XcodeBuildMCP tools.
Lines changed: 166 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,166 @@
1+
# XcodeBuildMCP Eval Deep Dive — Token Usage, Performance, and Next Steps
2+
3+
Date: February 24, 2026
4+
Author: Codex assistant (session with Cam)
5+
Run analyzed: `/Users/cameroncooke/Developer/mcp_evals/runs/20260120_225600`
6+
7+
## Executive Summary
8+
9+
The large token delta reported for XcodeBuildMCP vs `shell_primed` is real, but its root cause is often misinterpreted.
10+
11+
- The largest deltas are dominated by **cached input tokens** (replayed context across turns), not uncached “new reasoning” tokens.
12+
- This behavior is **not unique to XcodeBuildMCP**. It is a general multi-turn/tool-use property.
13+
- In this eval setup, XcodeBuildMCP incurs extra overhead due to:
14+
1. More setup/discovery turns,
15+
2. Larger tool surface/context,
16+
3. More replay cycles of prior context.
17+
- Therefore, the reported “+100k tokens” is mostly a **session-level usage accounting effect**, not direct evidence that the model exhausted context window capacity.
18+
19+
That said, XcodeBuildMCP still underperformed `shell_primed` on wall time and error rates in this dataset, so there are real workflow optimization opportunities.
20+
21+
---
22+
23+
## What We Verified in the Data
24+
25+
### 1) Token delta exists across agents, not just one
26+
27+
Comparing `mcp_unprimed_v2` vs `shell_primed` (mean per run, non-baseline):
28+
29+
- **Codex**: total input `+120,793` tokens
30+
- uncached `-8,453`
31+
- cached `+129,246`
32+
- **Claude Opus**: total input `+84,040`
33+
- uncached `+111`
34+
- cached `+83,929`
35+
- **Claude Sonnet**: total input `+132,568`
36+
- uncached `+228`
37+
- cached `+132,340`
38+
39+
Interpretation: the dominant effect is replayed cached context.
40+
41+
### 2) MCP path has more setup/tool turns
42+
43+
For MCP scenarios, a large percentage of calls are setup/discovery:
44+
- `session-set-defaults`, `list_schemes`, `list_sims`, `discover_projs` account for most MCP calls.
45+
- This increases turn count before build/test execution starts.
46+
47+
### 3) More turns + bigger replayed context => large cumulative usage
48+
49+
This is the key mechanics point:
50+
- Context window occupancy at a single turn is not “compounded”.
51+
- But **total token usage across session is compounded** because each turn reprocesses large prior prefixes (mostly cached).
52+
53+
So a 4k–10k larger reusable prefix can translate to very large cumulative token deltas over many turns.
54+
55+
### 4) Performance and reliability gaps vs `shell_primed`
56+
57+
Across agents, `mcp_unprimed_v2` generally trails `shell_primed` on:
58+
- wall time,
59+
- tool errors,
60+
- time-to-first-build.
61+
62+
Success rates are often comparable, but MCP gets there with more overhead in this run configuration.
63+
64+
---
65+
66+
## Clarification: Token Usage vs Context Window Usage
67+
68+
A key confusion to correct in reporting:
69+
70+
- **High cached token usage does not automatically mean context window was “wasted” or exhausted.**
71+
- Cached tokens are primarily an inference accounting/cost metric across turns.
72+
- Context pressure is a separate per-turn issue (depends on what is currently in-window, truncation/summarization, etc.).
73+
74+
This distinction should be explicit in future writeups.
75+
76+
---
77+
78+
## Why the Current Comparison Is Not Fully Fair
79+
80+
`shell_primed` is given deterministic build parameters in prompt.
81+
82+
`mcp_unprimed` / `mcp_unprimed_v2` are not equivalently pre-seeded; they discover and set defaults during the run.
83+
84+
That asymmetry structurally biases MCP toward extra turns and replay overhead.
85+
86+
---
87+
88+
## Planned Next Eval (Recommended)
89+
90+
Add a new scenario:
91+
92+
## `mcp_persisted_defaults`
93+
94+
Use production XcodeBuildMCP session-default persistence and pre-seed defaults prior to trial start.
95+
96+
Suggested scenario matrix:
97+
1. `shell_primed`
98+
2. `mcp_unprimed_v2`
99+
3. `mcp_persisted_defaults` (new)
100+
101+
Report these metrics prominently:
102+
- success rate,
103+
- wall time,
104+
- tool errors,
105+
- time-to-first-build,
106+
- MCP call mix (setup vs execution),
107+
- uncached input tokens,
108+
- cached input tokens,
109+
- billed cost,
110+
- cold-equivalent cost.
111+
112+
Primary efficiency metric for cross-agent/model comparisons should be **uncached input tokens** (plus wall time), with cached totals clearly labeled as replay/accounting heavy.
113+
114+
---
115+
116+
## Opportunities to Improve XcodeBuildMCP
117+
118+
1. **Collapse setup turns**
119+
- Provide/encourage a single bootstrap call (discover + defaults + selected target/sim) where possible.
120+
121+
2. **Lean response mode for agent workflows**
122+
- Reduce verbose “next steps” / repeated boilerplate in tool outputs for eval/agent mode.
123+
124+
3. **Task-scoped tool exposure**
125+
- Reduce visible tool surface when task scope is known (fewer irrelevant tools).
126+
127+
4. **Persisted defaults first-class UX**
128+
- Make persisted defaults the default recommended flow for repeat sessions and eval harnesses.
129+
130+
5. **Error payload quality**
131+
- Continue improving MCP error clarity to reduce retry churn and malformed follow-up calls.
132+
133+
6. **Deterministic fast path docs/prompts**
134+
- Publish a concise “low-turn recipe” for common build/test/install/launch flows to reduce exploratory calls.
135+
136+
---
137+
138+
## Shortcomings to Address in the Blog Post
139+
140+
1. **Conflating cumulative token usage with context-window pressure**
141+
- Clarify that high cached totals are mostly replay accounting across turns.
142+
143+
2. **Understating scenario asymmetry**
144+
- Explicitly call out that `shell_primed` had stronger deterministic priming than MCP scenarios.
145+
146+
3. **Insufficient emphasis on uncached vs cached split**
147+
- Show uncached deltas separately; this changes interpretation of “waste”.
148+
149+
4. **Need stronger caveat about provider/accounting differences**
150+
- Keep billed vs cold-equivalent vs uncached separated and explained.
151+
152+
5. **Frame result as workflow/tooling optimization target, not MCP category verdict**
153+
- Current result is about this tool surface + run shape + harness design, not all MCP usage.
154+
155+
---
156+
157+
## Final Position
158+
159+
The current data does not support “MCP inherently wastes context window.”
160+
161+
It does support:
162+
- MCP workflow shape in this eval caused more turns and larger replayed context,
163+
- this inflated cumulative token usage (mostly cached),
164+
- and increased wall-time/error overhead versus a strongly primed shell baseline.
165+
166+
The next decisive test is `mcp_persisted_defaults` under matched priming conditions.

docs/TOOLS-CLI.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ XcodeBuildMCP provides 73 canonical tools organized into 13 workflow groups.
1818

1919
- `build` - Build for device.
2020
- `clean` - Clean build products.
21-
- `discover-projects` - Scans a directory (defaults to workspace root) to find Xcode project (.xcodeproj) and workspace (.xcworkspace) files.
21+
- `discover-projects` - Scans a directory (defaults to workspace root) to find Xcode project (.xcodeproj) and workspace (.xcworkspace) files. Use when project/workspace path is unknown.
2222
- `get-app-bundle-id` - Extract bundle id from .app.
2323
- `get-app-path` - Get device built app path.
2424
- `install` - Install app on device.
@@ -38,7 +38,7 @@ XcodeBuildMCP provides 73 canonical tools organized into 13 workflow groups.
3838

3939
- `boot` - Defined in Simulator Management workflow.
4040
- `build` - Build for iOS sim (compile-only, no launch).
41-
- `build-and-run` - Build and run iOS sim (preferred for run/launch intent).
41+
- `build-and-run` - Build, install, and launch on iOS Simulator; boots simulator and attempts to open Simulator.app as needed. Preferred single-step run tool when defaults are set.
4242
- `clean` - Defined in iOS Device Development workflow.
4343
- `discover-projects` - Defined in iOS Device Development workflow.
4444
- `get-app-bundle-id` - Defined in iOS Device Development workflow.
@@ -130,10 +130,10 @@ XcodeBuildMCP provides 73 canonical tools organized into 13 workflow groups.
130130
### Simulator Management (`simulator-management`)
131131
**Purpose**: Tools for managing simulators from booting, opening simulators, listing simulators, stopping simulators, erasing simulator content and settings, and setting simulator environment options like location, network, statusbar and appearance. (8 tools)
132132

133-
- `boot` - Boot iOS simulator.
133+
- `boot` - Boot iOS simulator for manual/non-build flows. Not required before simulator build-and-run (build_run_sim).
134134
- `erase` - Erase simulator.
135135
- `list` - List iOS simulators.
136-
- `open` - Open Simulator app.
136+
- `open` - Open Simulator.app for visibility/manual workflows. Not required before simulator build-and-run (build_run_sim).
137137
- `reset-location` - Reset sim location.
138138
- `set-appearance` - Set sim appearance.
139139
- `set-location` - Set sim location.
@@ -189,4 +189,4 @@ XcodeBuildMCP provides 73 canonical tools organized into 13 workflow groups.
189189

190190
---
191191

192-
*This documentation is automatically generated by `scripts/update-tools-docs.ts` from the tools manifest. Last updated: 2026-02-22T18:16:55.247Z UTC*
192+
*This documentation is automatically generated by `scripts/update-tools-docs.ts` from the tools manifest. Last updated: 2026-02-24T11:32:32.907Z UTC*

docs/TOOLS.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ This document lists MCP tool names as exposed to MCP clients. XcodeBuildMCP prov
1616

1717
- `build_device` - Build for device.
1818
- `clean` - Clean build products.
19-
- `discover_projs` - Scans a directory (defaults to workspace root) to find Xcode project (.xcodeproj) and workspace (.xcworkspace) files.
19+
- `discover_projs` - Scans a directory (defaults to workspace root) to find Xcode project (.xcodeproj) and workspace (.xcworkspace) files. Use when project/workspace path is unknown.
2020
- `get_app_bundle_id` - Extract bundle id from .app.
2121
- `get_device_app_path` - Get device built app path.
2222
- `install_app_device` - Install app on device.
@@ -35,7 +35,7 @@ This document lists MCP tool names as exposed to MCP clients. XcodeBuildMCP prov
3535
**Purpose**: Complete iOS development workflow for both .xcodeproj and .xcworkspace files targeting simulators. (21 tools)
3636

3737
- `boot_sim` - Defined in Simulator Management workflow.
38-
- `build_run_sim` - Build and run iOS sim (preferred for run/launch intent).
38+
- `build_run_sim` - Build, install, and launch on iOS Simulator; boots simulator and attempts to open Simulator.app as needed. Preferred single-step run tool when defaults are set.
3939
- `build_sim` - Build for iOS sim (compile-only, no launch).
4040
- `clean` - Defined in iOS Device Development workflow.
4141
- `discover_projs` - Defined in iOS Device Development workflow.
@@ -130,7 +130,7 @@ This document lists MCP tool names as exposed to MCP clients. XcodeBuildMCP prov
130130

131131
- `session_clear_defaults` - Clear session defaults for the active profile or a specified profile.
132132
- `session_set_defaults` - Set session defaults for the active profile, or for a specified profile and make it active.
133-
- `session_show_defaults` - Show the current active defaults.
133+
- `session_show_defaults` - Show current active defaults (use when defaults are unknown or need verification).
134134
- `session_use_defaults_profile` - Switch the active session defaults profile.
135135
- `sync_xcode_defaults` - Sync session defaults (scheme, simulator) from Xcode's current IDE selection.
136136

@@ -139,10 +139,10 @@ This document lists MCP tool names as exposed to MCP clients. XcodeBuildMCP prov
139139
### Simulator Management (`simulator-management`)
140140
**Purpose**: Tools for managing simulators from booting, opening simulators, listing simulators, stopping simulators, erasing simulator content and settings, and setting simulator environment options like location, network, statusbar and appearance. (8 tools)
141141

142-
- `boot_sim` - Boot iOS simulator.
142+
- `boot_sim` - Boot iOS simulator for manual/non-build flows. Not required before simulator build-and-run (build_run_sim).
143143
- `erase_sims` - Erase simulator.
144144
- `list_sims` - List iOS simulators.
145-
- `open_sim` - Open Simulator app.
145+
- `open_sim` - Open Simulator.app for visibility/manual workflows. Not required before simulator build-and-run (build_run_sim).
146146
- `reset_sim_location` - Reset sim location.
147147
- `set_sim_appearance` - Set sim appearance.
148148
- `set_sim_location` - Set sim location.
@@ -205,4 +205,4 @@ This document lists MCP tool names as exposed to MCP clients. XcodeBuildMCP prov
205205

206206
---
207207

208-
*This documentation is automatically generated by `scripts/update-tools-docs.ts` from the tools manifest. Last updated: 2026-02-22T18:16:55.247Z UTC*
208+
*This documentation is automatically generated by `scripts/update-tools-docs.ts` from the tools manifest. Last updated: 2026-02-24T11:32:32.907Z UTC*

manifests/tools/boot_sim.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ module: mcp/tools/simulator/boot_sim
33
names:
44
mcp: boot_sim
55
cli: boot
6-
description: Boot iOS simulator.
6+
description: Boot iOS simulator for manual/non-build flows. Not required before simulator build-and-run (build_run_sim).
77
annotations:
88
title: Boot Simulator
99
destructiveHint: true

manifests/tools/build_run_sim.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ module: mcp/tools/simulator/build_run_sim
33
names:
44
mcp: build_run_sim
55
cli: build-and-run
6-
description: Build and run iOS sim (preferred for run/launch intent).
6+
description: Build, install, and launch on iOS Simulator; boots simulator and attempts to open Simulator.app as needed. Preferred single-step run tool when defaults are set.
77
predicates:
88
- hideWhenXcodeAgentMode
99
annotations:

manifests/tools/discover_projs.yaml

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,14 @@ module: mcp/tools/project-discovery/discover_projs
33
names:
44
mcp: discover_projs
55
cli: discover-projects
6-
description: Scans a directory (defaults to workspace root) to find Xcode project (.xcodeproj) and workspace (.xcworkspace) files.
6+
description: Scans a directory (defaults to workspace root) to find Xcode project (.xcodeproj) and workspace (.xcworkspace) files. Use when project/workspace path is unknown.
77
annotations:
88
title: Discover Projects
99
readOnlyHint: true
10+
nextSteps:
11+
- label: Save discovered project/workspace as session defaults
12+
toolId: session_set_defaults
13+
priority: 1
14+
- label: Build and run once defaults are set
15+
toolId: build_run_sim
16+
priority: 2

manifests/tools/open_sim.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,12 @@ module: mcp/tools/simulator/open_sim
33
names:
44
mcp: open_sim
55
cli: open
6-
description: Open Simulator app.
6+
description: Open Simulator.app for visibility/manual workflows. Not required before simulator build-and-run (build_run_sim).
77
annotations:
88
title: Open Simulator
99
destructiveHint: true
1010
nextSteps:
11-
- label: Boot a simulator if needed
11+
- label: Boot a simulator for manual workflows
1212
toolId: boot_sim
1313
params:
1414
simulatorId: UUID_FROM_LIST_SIMS

manifests/tools/session_show_defaults.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ module: mcp/tools/session-management/session_show_defaults
33
names:
44
mcp: session_show_defaults
55
cli: show-defaults
6-
description: Show the current active defaults.
6+
description: Show current active defaults (use when defaults are unknown or need verification).
77
annotations:
88
title: Show Session Defaults
99
readOnlyHint: true

skills/xcodebuildmcp-cli/SKILL.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -33,11 +33,11 @@ Notes:
3333

3434
### Build And Run On Simulator
3535

36-
If your intent is to run the app in Simulator, use `build-and-run` directly. It already performs the build step.
36+
If your intent is to run the app in Simulator, use `build-and-run` directly. It already performs the build step and handles simulator boot/open behavior.
3737
Do not run `build` first unless the user explicitly requests both commands.
38-
39-
1. List simulators and pick a device name or UDID.
40-
2. Build and run.
38+
If defaults already include project/workspace + scheme + simulator, call `build-and-run` directly.
39+
Only run discovery commands when project/workspace details are unknown after checking defaults.
40+
Never run project discovery speculatively or in parallel with `show-defaults`.
4141

4242
If app and project details are not known:
4343
```bash
@@ -154,7 +154,7 @@ To see all SwiftPM tools, view SwiftPM help:
154154
xcodebuildmcp swift-package --help
155155
```
156156

157-
### Project Discovery
157+
### Project Discovery (only when project/workspace is unknown)
158158

159159
```bash
160160
xcodebuildmcp project-discovery discover-projects --workspace-root .

0 commit comments

Comments
 (0)