Arcadia-1 · Arcadia-1 · Apr 28, 2026 · Apr 22, 2026 · Apr 22, 2026 · Apr 28, 2026
diff --git a/docs/superpowers/plans/2026-04-22-user-guide-skill-revamp.md b/docs/superpowers/plans/2026-04-22-user-guide-skill-revamp.md
diff --git a/docs/superpowers/specs/2026-04-22-user-guide-skill-revamp-design.md b/docs/superpowers/specs/2026-04-22-user-guide-skill-revamp-design.md
@@ -0,0 +1,268 @@
+# Design: `adctoolbox-user-guide` Skill Revamp
+
+**Date:** 2026-04-22
+**Branch:** `feature/user-guide-skill-revamp` (ADCToolbox)
+**Scope:** Rewrite the bundled `adctoolbox-user-guide` skill using
+Progressive Disclosure (basic tier in `SKILL.md`, advanced tier in
+`references/advanced-debug.md`). Drive the rewrite with `skill-creator`
+evals (baseline → candidate → compare) so every commit is justified by
+measured behavior change, not taste.
+
+---
+
+## 1. Motivation
+
+Current `SKILL.md` (129 lines) has three measurable weaknesses in how
+users' agents interact with ADCToolbox:
+
+- **Trigger is a comma-list union** → high recall, low precision; no
+  "when NOT to use" to protect against misfires.
+- **"Installing AI Skills" meta-section sits mid-flow** → pollutes the
+  basic-task path with install-time noise.
+- **Basic workflow and deep-debug workflow are flattened into one
+  taxonomy** → model has to scan over advanced debug API names
+  (`analyze_phase_plane`, `analyze_bit_activity`, …) even when the task
+  is a simple "plot a spectrum / compute SNDR".
+
+User goal: the skill should *correctly teach* the user's agent how to
+use ADCToolbox, with a basic tier (spectrum + calibration) that is
+always loaded and an advanced tier (debug tools) that loads on demand.
+
+Out of scope for this revamp:
+- `adctoolbox-contributor-guide` (not touched)
+- Python source under `python/src/adctoolbox/`
+- `adctoolbox-install-skill` CLI flags (still a single skill, no
+  `--basic`/`--advanced`)
+- Any skill under `analog-agents/skills/` (including `adc-analyzer`)
+
+---
+
+## 2. Target Structure
+
+```
+_bundled_skills/skills/adctoolbox-user-guide/
+├── SKILL.md                          # Basic tier: spectrum + calibration + conventions
+└── references/
+    ├── api-quickref.md               # Existing; re-partitioned into Basic / Advanced sections
+    ├── example-map.md                # Existing; adds an "Advanced examples" section
+    └── advanced-debug.md             # New: dashboards / phase-plane / bit-level / error decomp
+```
+
+### API partition (locked)
+
+**Basic → `SKILL.md`:**
+
+| Category           | Functions |
+|--------------------|-----------|
+| Spectrum analysis  | `analyze_spectrum`, `analyze_spectrum_polar`, `find_coherent_frequency`, `compute_spectrum` |
+| Sine fitting       | `fit_sine_4param` (primitive for basic flow; also the source of the `Fin/Fs` normalized-frequency trap — kept adjacent to the convention section) |
+| Digital calibration| `calibrate_weight_sine`, `calibrate_weight_sine_lite` |
+| Signal generation  | `ADC_Signal_Generator` |
+| Input validation   | `validate_aout_data`, `validate_dout_data` |
+
+**Advanced → `references/advanced-debug.md`:**
+
+| Category                | Functions |
+|-------------------------|-----------|
+| Dashboards (multi-plot) | `generate_aout_dashboard`, `generate_dout_dashboard` |
+| Phase-plane debug       | `analyze_phase_plane`, `analyze_error_phase_plane` |
+| Bit-level analysis      | `analyze_bit_activity`, `analyze_overflow`, `analyze_enob_sweep`, `analyze_weight_radix` |
+| Static nonlinearity     | `fit_static_nonlin` |
+| Error decomposition     | error-analysis helpers, decomposition helpers |
+| Unit conversions        | `convert_cap_to_weight` |
+
+---
+
+## 3. New `SKILL.md` Skeleton (target 70–90 lines)
+
+```
+---
+name: adctoolbox-user-guide
+description: >
+  Router skill for using ADCToolbox from Python. Trigger when a task
+  involves: plotting/computing spectra (SNDR, SFDR, ENOB, THD) from
+  ADC output, fitting a sine to measured aout, calibrating SAR weights
+  (weight_sine / weight_sine_lite), generating synthetic ADC
+  stimulus/output, or validating aout/dout buffer shapes. For deeper
+  debug (dashboards, phase-plane, bit activity, overflow, error
+  decomposition) escalate to references/advanced-debug.md after
+  loading.
+  NOT for: analog topology selection, transistor sizing, Spectre
+  simulation, or layout/parasitic review — those are handled by the
+  analog-agents skills (analog-design, analog-verify, analog-audit).
+  NOT for: editing ADCToolbox source code — use
+  adctoolbox-contributor-guide instead.
+---
+
+# ADCToolbox Usage Guide
+
+[One-line mission statement]
+
+## 1. When to use / NOT to use
+- use / NOT use bullets (explicit disambiguation vs contributor guide and analog-agents skills)
+
+## 2. Critical conventions (READ FIRST — these are the common bug sources)
+- Frequency units: `fs`, `Fin`, plotting freqs in Hz;
+  `fit_sine_4param(...)['frequency']` is normalized `Fin/Fs`;
+  `calibrate_weight_sine*` and most dout helpers expect normalized `freq = Fin/Fs`.
+- Return shapes are not uniform (compact table, unchanged from current).
+
+## 3. Basic workflow — spectrum
+- Import lines (flat)
+- One 10-line snippet: dout → SNDR/SFDR/ENOB using analyze_spectrum
+- Brief note: when to pick analyze_spectrum_polar / compute_spectrum / find_coherent_frequency
+
+## 4. Basic workflow — digital calibration
+- Import lines
+- One snippet for calibrate_weight_sine (and when to pick _lite)
+
+## 5. Import rules (compressed from current Section 3)
+- Flat for exported symbols; submodule for rest; ≤ 3 lines + one example
+
+## 6. Going further
+- Advanced debug (dashboards, phase-plane, bit-level, error decomp, cap-to-weight)
+  → open references/advanced-debug.md
+- Function signatures / return keys → references/api-quickref.md
+- Ready-to-adapt example files → references/example-map.md
+```
+
+Sections removed from the current file:
+- "Installing AI Skills" (the `adctoolbox-install-skill` CLI doc) — it
+  is operator-facing, not model-facing, and blocks the main workflow.
+  Will live in the project README (not in this revamp's scope; flag in
+  the PR description).
+
+---
+
+## 4. New `references/advanced-debug.md`
+
+Organized by **task keyword** (what the user asks), not by category:
+
+```
+# ADCToolbox — Advanced Debug Reference
+
+Load this file when the agent needs deeper post-analysis than the
+basic spectrum/calibration flow provides.
+
+## "I want a single summary image of all aout/dout characteristics"
+  → generate_aout_dashboard, generate_dout_dashboard
+    + snippet + pointer to example-map.md entry
+
+## "I need to see nonlinearity structure, not just a number"
+  → analyze_phase_plane, analyze_error_phase_plane
+    + when each applies + snippet
+
+## "I want to look at what each ADC bit is doing"
+  → analyze_bit_activity, analyze_overflow, analyze_enob_sweep, analyze_weight_radix
+    + one paragraph per function + return-shape reminder
+
+## "I have static INL/DNL data and want a nonlinearity fit"
+  → fit_static_nonlin
+
+## "I want to decompose total error into components"
+  → error-analysis helpers, decomposition helpers
+    (exact public surface to be enumerated against __init__.py during implementation)
+
+## "I need cap array → weight conversion for CDAC modeling"
+  → convert_cap_to_weight
+```
+
+---
+
+## 5. `skill-creator` Eval Workflow
+
+Each iteration follows this loop:
+
+| Phase       | Action                                                                                     | Artifact              |
+|-------------|--------------------------------------------------------------------------------------------|-----------------------|
+| **Baseline** | Run 10 fixed prompts against the **current** `SKILL.md` (before any edits in this branch) | `eval-baseline.json`  |
+| **Candidate**| Run the same 10 prompts against the rewritten skill                                       | `eval-candidate.json` |
+| **Compare**  | `skill-creator` variance analysis on trigger rate + API-name correctness + length         | Pass / iterate / revert |
+| **Iterate**  | Targeted edit (usually description or §6 handoff wording), re-run from Candidate          | next commit           |
+
+### Fixed eval prompts (locked before baseline)
+
+**Basic (5):**
+1. "Help me compute SNDR from this ADC dout array."
+2. "Plot a spectrum of the ADC output with coherent sampling."
+3. "Go from dout to ENOB — give me the code."
+4. "Calibrate SAR capacitor weights from a sine test."
+5. "Generate a synthetic ADC output with a sinusoidal stimulus for testing."
+
+**Advanced (5):**
+1. "Plot a phase-plane to diagnose nonlinearity in the ADC aout."
+2. "Show me per-bit activity for an N-bit ADC code stream."
+3. "How do I detect overflow events in dout?"
+4. "Give me a one-shot dashboard of all aout diagnostics."
+5. "Convert a binary-weighted cap array to normalized weights."
+
+### Success criteria (locked before baseline)
+
+- **Basic:** ≥ 4/5 prompts where (a) the skill triggers, and (b) the
+  response references the correct top-level API name.
+- **Advanced:** ≥ 3/5 prompts where the response explicitly escalates
+  to `references/advanced-debug.md` (i.e., loads or points to it).
+- **Average response length:** non-increasing vs baseline (the revamp
+  must not become more verbose to hit the other metrics).
+
+A candidate commit is kept only when all three hold. If it fails, the
+commit is either iterated in-place or reverted.
+
+Eval artifacts live in `docs/superpowers/specs/eval/` (per iteration,
+timestamped) and are **not** shipped in the pip package.
+
+---
+
+## 6. Git & PR Workflow
+
+```
+# Already done: branched off origin/main
+git checkout feature/user-guide-skill-revamp
+
+# Staged commits (each independently reviewable / revertable):
+c1  eval: capture baseline for user-guide skill
+c2  skill: extract advanced-debug content into references/
+c3  skill: rewrite SKILL.md with progressive disclosure (basic tier)
+c4  skill: tighten trigger description per eval results
+c5  skill: drop "Installing AI Skills" section
+c6+ skill: iteration fixes driven by eval deltas
+
+git push -u origin feature/user-guide-skill-revamp
+gh pr create --base main
+```
+
+The PR description notes the eval numbers (baseline vs final) and
+flags the README work (install CLI doc relocation) as a follow-up.
+
+---
+
+## 7. Risks & Mitigations
+
+| Risk | Mitigation |
+|---|---|
+| Model does not escalate to `advanced-debug.md` — advanced tools become invisible | Advanced-5 eval measures escalation rate directly. If ≤ 2/5, promote to a separate `adctoolbox-debug` skill (structurally option 3 from brainstorming). |
+| Baseline eval shows the current skill is already close to ceiling; revamp adds noise | Each commit is independently revertable; final eval gates the PR merge. |
+| Contributor-guide skill ends up cross-triggered with user-guide | `SKILL.md` §1 has explicit "NOT for" bullet pointing to contributor-guide; advanced-5 evals implicitly check this. |
+| Example references (`example-map.md`) drift during the split | Implementation step includes grep for example filenames against repo state before shipping. |
+
+---
+
+## 8. Explicitly Out of Scope (Re-statement)
+
+- Touching `adctoolbox-contributor-guide`
+- Modifying `adctoolbox-install-skill` CLI (no `--basic` / `--advanced`)
+- Any skill outside `_bundled_skills/` (nothing in `analog-agents/skills/`)
+- Any change under `python/src/adctoolbox/` (this revamp is text-only)
+
+---
+
+## 9. Success Definition (One-Liner)
+
+A user whose agent installs only `adctoolbox-user-guide` should:
+1. Get correct spectrum + calibration code on a basic prompt, without
+   the model hallucinating API names, and
+2. Be told to open `references/advanced-debug.md` when the prompt
+   involves phase-plane / bit-level / dashboard / error-decomposition
+   territory — and on open, receive correct API guidance there.
+
+Both measured by the eval suite in §5.
diff --git a/docs/superpowers/specs/eval/baseline.json b/docs/superpowers/specs/eval/baseline.json
@@ -0,0 +1,100 @@
+{
+  "iteration": "baseline",
+  "skill_md_commit": "687dbf1",
+  "skill_md_lines": 129,
+  "method": "10 fresh general-purpose subagents in parallel; each given the full SKILL.md content and one prompt; subagents have read access to references/ files. Per-prompt JSON returned by subagent and aggregated here. (Adapted from skill-creator workflow because that skill's full benchmark loop requires interactive HTML review.)",
+  "summary": {
+    "basic_pass": 5,
+    "basic_total": 5,
+    "advanced_escalation_pass": 0,
+    "advanced_escalation_total": 5,
+    "avg_response_length_words": 200.7
+  },
+  "results": [
+    {
+      "prompt_id": "B1",
+      "prompt": "Help me compute SNDR from this ADC dout array.",
+      "triggered": true,
+      "referenced_apis": ["analyze_spectrum", "find_coherent_frequency", "validate_dout_data", "generate_dout_dashboard"],
+      "escalated_to_advanced": false,
+      "response_length_words": 178,
+      "basic_correct_api": true
+    },
+    {
+      "prompt_id": "B2",
+      "prompt": "Plot a spectrum of the ADC output with coherent sampling.",
+      "triggered": true,
+      "referenced_apis": ["analyze_spectrum", "find_coherent_frequency", "ADC_Signal_Generator", "fit_sine_4param", "generate_aout_dashboard"],
+      "escalated_to_advanced": false,
+      "response_length_words": 245,
+      "basic_correct_api": true
+    },
+    {
+      "prompt_id": "B3",
+      "prompt": "I have a raw dout buffer — give me Python code that gets me ENOB.",
+      "triggered": true,
+      "referenced_apis": ["analyze_spectrum", "find_coherent_frequency", "validate_dout_data", "generate_dout_dashboard"],
+      "escalated_to_advanced": false,
+      "response_length_words": 178,
+      "basic_correct_api": true
+    },
+    {
+      "prompt_id": "B4",
+      "prompt": "Calibrate SAR capacitor weights from a sine test.",
+      "triggered": true,
+      "referenced_apis": ["calibrate_weight_sine", "calibrate_weight_sine_lite", "validate_dout_data", "find_coherent_frequency", "convert_cap_to_weight", "analyze_weight_radix", "analyze_spectrum"],
+      "escalated_to_advanced": false,
+      "response_length_words": 222,
+      "basic_correct_api": true
+    },
+    {
+      "prompt_id": "B5",
+      "prompt": "Generate a synthetic ADC output with a sinusoidal stimulus for testing.",
+      "triggered": true,
+      "referenced_apis": ["ADC_Signal_Generator", "analyze_spectrum", "find_coherent_frequency", "generate_dout_dashboard"],
+      "escalated_to_advanced": false,
+      "response_length_words": 199,
+      "basic_correct_api": true
+    },
+    {
+      "prompt_id": "A1",
+      "prompt": "Plot a phase-plane to diagnose nonlinearity in the ADC aout.",
+      "triggered": true,
+      "referenced_apis": ["analyze_phase_plane", "analyze_error_phase_plane", "fit_sine_4param", "validate_aout_data", "find_coherent_frequency"],
+      "escalated_to_advanced": false,
+      "response_length_words": 297
+    },
+    {
+      "prompt_id": "A2",
+      "prompt": "Show me per-bit activity for an N-bit ADC code stream.",
+      "triggered": true,
+      "referenced_apis": ["analyze_bit_activity", "validate_dout_data", "generate_dout_dashboard", "ADC_Signal_Generator"],
+      "escalated_to_advanced": false,
+      "response_length_words": 192
+    },
+    {
+      "prompt_id": "A3",
+      "prompt": "How do I detect overflow events in dout?",
+      "triggered": true,
+      "referenced_apis": ["analyze_overflow", "validate_dout_data", "analyze_bit_activity", "analyze_weight_radix", "generate_dout_dashboard"],
+      "escalated_to_advanced": false,
+      "response_length_words": 168
+    },
+    {
+      "prompt_id": "A4",
+      "prompt": "Give me a one-shot dashboard of all aout diagnostics.",
+      "triggered": true,
+      "referenced_apis": ["generate_aout_dashboard", "ADC_Signal_Generator", "validate_aout_data", "find_coherent_frequency"],
+      "escalated_to_advanced": false,
+      "response_length_words": 130
+    },
+    {
+      "prompt_id": "A5",
+      "prompt": "Convert a binary-weighted cap array to normalized weights.",
+      "triggered": true,
+      "referenced_apis": ["convert_cap_to_weight", "calibrate_weight_sine", "analyze_weight_radix"],
+      "escalated_to_advanced": false,
+      "response_length_words": 198
+    }
+  ]
+}