Summary
In hooks/PromptProcessing.hook.ts, mode/tier classification shares a single claude inference call with tab-title + session-name generation. That combined call uses a large (~3,000-word) system prompt on the standard (sonnet) level. Because the call is slow and occasionally returns unparseable JSON, classification fails on a large fraction of prompts — and every failure falls to the conservative fail-safe (ALGORITHM E3), which defeats the point of the classifier.
Evidence
From one install, ~1,700 logged classifications in MEMORY/OBSERVABILITY/prompt-processing.jsonl:
- ~25% of classifications fell to
source: fail-safe.
- Breakdown of the failures: ~82% timeouts (25s), ~15% JSON-parse failures, the rest non-zero exit / PATH.
- Classifier latency p50 ≈ 14s, p90 ≈ 22s, p95 ≈ 24s — right at the 25s timeout cliff, which is why the tail times out.
Root cause
One heavy call does three jobs. The cost and latency of naming — the non-critical part, which already has deterministic fallbacks — is coupled to classification, the executor-gating part. So a slow or unparseable naming response forces classification to fail-safe.
Fix that worked
Split mode/tier into its own dedicated call with a tiny (~120-word) classify-only prompt on the fast (haiku) level, run in parallel with the naming call:
- Classification dropped to ~6–15s and rarely fails.
- A slow/unparseable naming response can no longer force a fail-safe.
- Post-change the fail-safe rate dropped to single digits.
Two correctness fixes pair naturally with it: hardening the --json parser in Inference.ts (see #1323) and scrubbing leftover CLAUDE_CODE_* env vars before the child spawn (see #931). #1158 (slash-prefixed prompts) is a separate fail-safe trigger this doesn't address.
Happy to share the diff if it's useful.
Summary
In
hooks/PromptProcessing.hook.ts, mode/tier classification shares a singleclaudeinference call with tab-title + session-name generation. That combined call uses a large (~3,000-word) system prompt on thestandard(sonnet) level. Because the call is slow and occasionally returns unparseable JSON, classification fails on a large fraction of prompts — and every failure falls to the conservative fail-safe (ALGORITHM E3), which defeats the point of the classifier.Evidence
From one install, ~1,700 logged classifications in
MEMORY/OBSERVABILITY/prompt-processing.jsonl:source: fail-safe.Root cause
One heavy call does three jobs. The cost and latency of naming — the non-critical part, which already has deterministic fallbacks — is coupled to classification, the executor-gating part. So a slow or unparseable naming response forces classification to fail-safe.
Fix that worked
Split mode/tier into its own dedicated call with a tiny (~120-word) classify-only prompt on the
fast(haiku) level, run in parallel with the naming call:Two correctness fixes pair naturally with it: hardening the
--jsonparser inInference.ts(see #1323) and scrubbing leftoverCLAUDE_CODE_*env vars before the child spawn (see #931). #1158 (slash-prefixed prompts) is a separate fail-safe trigger this doesn't address.Happy to share the diff if it's useful.