A user-oriented guide to optimizing bundle-plugins and individual skills with Bundles Forge. Covers quick start, scope detection, the diagnose → delegate → verify pipeline, 7 optimization targets, A/B evaluation, and feedback iteration.
Optimizing is the orchestrator for iterative improvement in the hub-and-spoke model. It diagnoses what needs to change, delegates substantive content edits to bundles-forge:authoring, and verifies outcomes by invoking bundles-forge:auditing (one-way: auditing reports only; it does not call back into optimizing).
Unlike auditing (which assesses only), optimizing drives improvement from the hub — fixing descriptions, reducing tokens, tightening workflow chains, and processing user feedback — routing SKILL.md and agent content work to authoring where appropriate.
Core principle: Optimize for the agent's experience. Every improvement should make skills easier to discover, faster to load, and clearer to follow.
Canonical source: The full execution protocol (scope detection, target details, A/B eval steps, feedback validation) lives in
skills/optimizing/SKILL.md. This guide helps you understand what each target does, how to interpret results, and when to use which approach.
The fastest way to improve an existing bundle-plugin or skill:
- Run an audit (optional but recommended) —
bundles-forge:auditingproduces a diagnostic report that optimizing can consume directly - Invoke
bundles-forge:optimizing(or ask the agent to optimize your project or skill) - The agent auto-detects scope — project root or single skill — and selects applicable targets
- Review proposed changes — the agent presents its improvement plan before applying
- Verify — the agent runs
bundles-forge:auditingafter changes to confirm improvement
Optimizing accepts local paths, GitHub URLs, and zip/tar.gz files as input. You don't need to clone a repo first.
Optimizing auto-detects your scope from the path you provide:
| Scope | Detection | Mode | Applicable Targets |
|---|---|---|---|
| Project root | Has skills/ + package.json |
Project optimization | All 7 targets + feedback |
| Single skill directory | Has SKILL.md, no skills/ subdirectory |
Skill optimization | Core targets (1-3) + feedback |
Are you optimizing an entire project or a single skill?
├─ Entire project → Project optimization (all 7 targets)
│ ├─ Have an audit report? → Feed it as input for prioritized optimization
│ └─ No audit report? → Optimizing runs its own diagnosis
└─ Single skill → Skill optimization (targets 1-3 + feedback)
├─ Skill triggered but produced wrong results? → Feedback iteration
└─ Skill needs engineering improvement? → Targets 1-3
| Input | Action |
|---|---|
| Local directory path | Use directly |
| GitHub repo URL | Shallow clone to temp directory |
| GitHub subdirectory URL | Clone repo, extract subdirectory |
| Zip/tar.gz file path | Extract to temp directory |
Remote sources (GitHub URLs, archives) are cloned or downloaded without executing hooks or scripts — the skill scans for risks before running any project code.
If download fails, the skill reports the error and suggests alternatives (provide a local path or zip file).
Optimizing can consume reports from prior audits. A common pattern is audit first, then optimize based on findings — but remember that bundles-forge:auditing is diagnostic only: it does not invoke optimizing. You run optimizing when you want fixes, or another orchestrator (for example bundles-forge:releasing) sequences auditing and optimizing in its pipeline.
| Input | Source | Use |
|---|---|---|
audit-report |
bundles-forge:auditing (full project) |
Per-skill breakdowns for all 7 targets |
skill-report |
bundles-forge:auditing (skill mode) |
Focused 4-category report for skill optimization |
workflow-report |
bundles-forge:auditing (workflow mode) |
W1-W9 findings for Target 3 |
user-feedback |
Direct from user | Behavioral feedback for the iteration process |
Optimizing follows a one-way verification pattern toward auditing, not a mutual skill cycle. Auditing never invokes optimizing; optimizing calls auditing when it needs a verification report after changes.
optimizing diagnoses → delegates content edits to authoring → verifies via auditing
Important: If verification still shows issues, present them to the user for a manual decision — do not loop indefinitely. Further passes are a new user- or orchestrator-driven run of optimizing (or authoring), not auditing "calling back" into optimizing.
- Optionally run
bundles-forge:auditing(full, skill, or workflow mode) to produce a diagnostic report as input. - Review the report — prioritize critical findings.
- Run
bundles-forge:optimizingwith the audit report (or your goals). It diagnoses, delegates content work tobundles-forge:authoringas needed, applies non-content optimizations per its protocol, and invokesbundles-forge:auditingfor post-change verification (optimizing triggers this step — auditing does not auto-start optimizing). - Review the verification report — if issues remain, decide manually whether to run another optimizing or authoring pass.
Select targets based on audit findings or user request — don't run all 7 sequentially:
| Finding / Signal | Target |
|---|---|
| Q-findings (description anti-patterns, frontmatter issues) | Target 1, 2 |
| W-findings (workflow integrity issues) | Target 3 |
| Platform gaps identified | Invoke bundles-forge:scaffolding directly |
| Security findings (SC/AG checks) | Target 4 |
| User requests adding/replacing/reorganizing skills | Target 5 |
| Component signals (userConfig, MCP, LSP needs) | Target 6 |
| Deprecated skills, renamed skills, platform removal | Target 7 |
| User behavioral feedback about skill quality | Feedback Iteration |
After routing to a target, the agent classifies the optimization action. This determines whether to patch an existing skill or create something new:
| Type | When | What Happens |
|---|---|---|
| FIX | Skill has a defect — outdated instructions, broken references, ineffective steps | Repair in place. The skill's core goal and scope do not change |
| DERIVED | Skill works but needs enhancement or specialization for a new context | A variant or improved version is created. The original remains available |
| CAPTURED | A workflow gap exists — no skill covers a needed capability | A new skill is created from scratch to fill the gap |
The agent explicitly states its classification and rationale before making changes. When a change touches ## Outputs or ## Integration, the agent maps all downstream skills that consume those artifacts and includes their updates in the same pass — preventing breakage discovered only at verification time.
Beyond automated linter checks (bundles-forge audit-skill), the agent assesses each skill across four qualitative dimensions:
| Dimension | What It Tells You |
|---|---|
| Trigger confidence | Can realistic user prompts correctly trigger this skill? Low confidence points to Target 1 |
| Execution clarity | Once triggered, can an agent follow the steps without ambiguity? Vague instructions or implicit assumptions indicate a FIX is needed |
| End-to-end completeness | Does the full flow from trigger to output have gaps? Missing handoffs or undefined artifacts point to Target 3 or a CAPTURED action |
| Degradation signals | Has this skill stopped working in practice? Recurring audit findings or user reports of wrong output signal an urgent FIX |
When assessment or audit findings reveal structural gaps — not just broken connections but missing capabilities — the agent considers whether a new skill should be created (CAPTURED) rather than patching existing ones. Common gap signals include W2 (unreachable skill), dead zones in the workflow graph, and repeated manual work that no existing skill covers.
These targets apply to both project optimization and single-skill optimization.
The highest-impact optimization. Descriptions are the primary mechanism for skill discovery — when a user says something, the agent matches intent against description fields.
The critical rule: Descriptions must state triggering conditions, not workflow summaries.
| Bad (workflow summary) | Good (triggering conditions) | |
|---|---|---|
| Pattern | "Use for auditing — scans structure, checks manifests, scores categories, generates report" | "Use when reviewing a bundle-plugin for structural issues, version drift, or before release" |
| Problem | Agent follows the description shortcut instead of reading the full SKILL.md | Agent reads the full skill for execution details |
| Result | Skipped steps, incomplete execution | Full execution as designed |
Additional rules:
- Always start with "Use when..."
- Keep under 250 characters (truncated in skill listings beyond this)
- Include concrete symptoms, situations, and contexts
- Never mention the skill's internal steps
Run the linter to catch mechanical issues:
bundles-forge audit-skill <target-dir>Description-specific checks are Q3-Q7: missing description (Q3), "Use when..." prefix (Q5), workflow summary anti-pattern (Q6), and length >250 characters (Q7). The full lint suite covers Q1-Q15 and X1-X3 — see Quick Reference for the complete list.
For behavioral quality (does the right prompt trigger the right skill?), use A/B eval.
Covers both token efficiency and layer assignment — reducing what agents load and ensuring content lives at the right level.
Every token in a frequently-loaded skill costs context budget across every session. This matters most for the bootstrap skill (loaded every session) and commonly-triggered skills.
Targets:
- SKILL.md body < 500 lines
- Bootstrap skill (
using-*) < 200 lines - Move heavy reference material to
references/
Techniques:
| Technique | Example |
|---|---|
| Cross-reference instead of repeating | See bundles-forge:authoring instead of duplicating rules |
| One excellent example over three mediocre | Remove redundant examples that teach the same concept |
| Move flag docs to --help | Reference bundles-forge audit-skill --help instead of listing all flags |
| Eliminate intra-project redundancy | Don't repeat what's in another skill's references/ |
The three-level loading system ensures minimal context usage:
| Level | When Loaded | Budget |
|---|---|---|
| Metadata (name + description) | Always in context | ~100 words |
| SKILL.md body | When skill triggers | < 500 lines |
Reference files (references/) |
On demand | Unlimited |
When to extract to references/:
- SKILL.md approaching 500 lines
- Tables or checklists that are only needed during execution (not for understanding the skill's purpose)
- Template content that the agent copies verbatim
Consumes workflow audit findings to identify and fix workflow issues. The workflow audit has two layers:
- Script-automated (W1-W9): Static graph analysis and semantic checks — run via
bundles-forge audit-workflow - Evaluator-only (W10-W11): Chain evaluation and behavioral verification — requires
evaluatoragent dispatch
If no workflow report is available:
bundles-forge audit-workflow <target-dir>
bundles-forge audit-workflow --focus-skills skill-a,skill-b <root>Fix priority guide:
| Finding | What It Means | How to Fix |
|---|---|---|
| W1 (undeclared cycle) | Two skills call each other but the loop isn't declared | Add <!-- cycle:a,b --> in ## Integration if intentional, or restructure |
| W2 (unreachable skill) | Skill exists but nothing chains to it | Add to bootstrap routing, or declare Called by: user directly |
| W3/W4 (missing I/O) | Terminal skill has no ## Outputs, or referenced skill has no ## Inputs |
Add the section with artifact IDs |
| W5 (artifact ID mismatch) | Upstream ## Outputs and downstream ## Inputs use different names |
Align the backtick artifact IDs |
| W9 (placeholder sections) | Inputs/Outputs exist but are empty or generic | Write meaningful semantic descriptions |
| W10 (asymmetric integration) | Skill A says it calls B, but B doesn't say it's called by A | Add the missing **Called by:** declaration |
In single-skill mode, only W9 (placeholder sections) and W10 (asymmetric integration) apply — the rest require project-wide graph analysis.
These targets are skipped in single-skill mode. They require project-wide context.
Fix security findings from bundles-forge:auditing Category 10. Common fixes:
| Finding | Fix |
|---|---|
| Hook script makes network calls | Remove or justify with comments |
| OpenCode plugin has excessive capabilities | Scope to declared needs |
| Agent prompt lacks scope constraints | Add explicit boundaries |
| SKILL.md contains encoded/obfuscated content | Strip or replace with plain text |
Structural changes to the project: adding skills, replacing skills, reorganizing workflow chains, or converting skills to subagents. This was previously part of blueprinting (Scenario D) but belongs in optimizing because it operates on existing projects without producing a design document.
| User Says | Action |
|---|---|
| "Add a new skill to my project" | Target 5a — add skill, wire into workflow |
| "Replace this skill with a better one" | Target 5b — replace and update references |
| "The workflow chain needs reorganizing" | Target 5c — restructure execution paths |
| "This skill should be a subagent instead" | Target 5d — convert to read-only agent |
| "My project needs better X capability" | Feedback process → may lead to Target 5a |
The most common restructuring operation. The process:
- Read the existing project — map skills, workflow graph, bootstrap routing
- Inventory new skills — source, structure, frontmatter quality
- Check compatibility against the existing project (naming, responsibilities, conventions)
- For third-party skills — follow the shared integration reference (
references/third-party-integration.md) covering license, security audit, and integration intent - Design insertion points — where do new skills connect?
- Apply — copy, adapt, update Integration sections
- Verify — focused workflow audit with
--focus-skills
Same compatibility analysis as adding, plus:
- Map all references to the old skill
- Update cross-references, Integration sections, and routing table
- Verify with workflow audit
When the execution chain has inefficiencies:
- Map the current graph and identify bottlenecks or unnecessary handoffs
- Propose new chain (present to user)
- Update Integration sections and routing
- Verify with Chain A/B Eval
Candidates for conversion:
- Execution produces verbose temporary context (search results, file contents, logs) that subsequent steps don't need
- Skills that only inspect/validate without modifying files
- Skills that produce structured reports (self-contained output)
- Skills that could run in parallel with other work (optional bonus)
Conversion extracts the execution protocol into agents/<role>.md with fallback logic for when subagents are unavailable. After conversion, dispatch the evaluator agent with test prompts to confirm the new agent correctly executes the former skill's responsibilities, then run bundles-forge:auditing to verify dispatch/fallback logic.
Add, adjust, or migrate optional plugin components based on evolving project needs. This target handles the gap between initial scaffolding and the components a project needs as it matures.
| Signal | Component | Action |
|---|---|---|
Skills hardcode API keys/endpoints as ${VAR} env vars |
userConfig |
Migrate to userConfig for automatic user prompting |
Audit finds MCP servers without userConfig-backed auth |
userConfig |
Add userConfig fields with sensitive: true |
| Skills reference external SaaS APIs with no integration | .mcp.json or bin/ |
Add MCP server or CLI — consult decision tree |
| Skills involve language-specific code intelligence | .lsp.json |
Add LSP server config |
| Users request custom output formats | output-styles/ |
Add output style definitions |
| Plugin MCP server has npm dependencies | ${CLAUDE_PLUGIN_DATA} |
Add SessionStart dependency install hook |
Plugin uses ../ paths or writes to ${CLAUDE_PLUGIN_ROOT} |
Path migration | Fix to use relative ./ paths and ${CLAUDE_PLUGIN_DATA} |
- Diagnose — identify signals from audit reports, user feedback, or direct inspection
- Decide — consult
skills/scaffolding/references/external-integration.mdfor the full decision tree (CLI vs MCP, userConfig schema, PLUGIN_DATA patterns, LSP fields, output-styles format) - Execute — invoke
bundles-forge:scaffoldingusing its "Adding Optional Components" flow - Verify — run
bundles-forge:auditingto confirm structural integrity and security compliance (especially for new MCP servers and userConfig sensitive values)
Coordinate the deprecation, renaming, splitting, or merging of skills. This target ensures all references remain consistent across the project during structural changes.
Deprecation — mark a skill as deprecated without removing it:
- Add
deprecated: trueandsuperseded-by: <project>:<replacement>to the skill's frontmatter - Prepend the description with a deprecation notice:
"Use when... (deprecated — use <replacement> instead)" - Update the bootstrap routing table to note the deprecation
- Update cross-references in other skills'
## Integrationsections
Renaming — change a skill's name while preserving all connections:
- Rename the directory:
skills/old-name/→skills/new-name/ - Update frontmatter
namefield - Update all cross-references (
<project>:old-name→<project>:new-name) across all SKILL.md, Integration sections, and documentation - Update bootstrap routing table
- Run
bundles-forge audit-docsto catch any missed references
Splitting — divide a skill into multiple focused skills:
- Design the new skill boundaries (reuse
bundles-forge:blueprintingscenario B) - Invoke
bundles-forge:scaffoldingfor new skill directories - Invoke
bundles-forge:authoringto write each new skill's content - Update all references to the original skill
- Deprecate the original (or remove if all functionality is covered)
- Run
bundles-forge:auditingin workflow mode to verify chain integrity
Merging — combine multiple skills into one:
- Design the merged skill (reuse
bundles-forge:blueprintingscenario C) - Invoke
bundles-forge:authoringto write the merged content - Deprecate the source skills
- Update all cross-references and routing
- Run
bundles-forge:auditingin workflow mode
Platform cleanup — after any structural change:
- Remove deprecated skill references from platform manifests
- Update
.version-bump.jsonif manifest paths changed - Run
bundles-forge:testingto verify component discovery
A/B eval is the core quality assurance mechanism for description changes and feedback-driven improvements. It compares original vs optimized versions side-by-side.
1. Copy the skill to a working version (<skill-name>-optimized/)
2. Apply changes to the copy only (never overwrite the original first)
3. Create 5+ realistic test prompts that should trigger this skill
4. Dispatch two evaluator agents in parallel:
- Evaluator A: "original" label → test with original skill
- Evaluator B: "optimized" label → test with optimized skill
5. Compare results → present to user
6. User decides: adopt optimized version or discard
| Metric | What It Tells You |
|---|---|
| Trigger rate | How many prompts correctly activated the skill? |
| False negatives | Did the optimized description miss cases the original caught? |
| False positives | Did either version trigger on prompts meant for other skills? |
| Step accuracy | Did the agent follow all steps, or take shortcuts? |
| Situation | Skip? | Rationale |
|---|---|---|
| Purely additive change (new trigger phrases, no modifications) | Yes | Simple verification pass is sufficient |
| Structural fix (missing section, broken reference) | Yes | Not a behavioral change |
| Description rewrite changing existing triggers | No | Must verify no regressions |
| Feedback-driven behavior change | No | Must compare old vs new behavior |
For workflow transitions (not individual descriptions), use chain evaluation:
- Define a realistic end-to-end scenario
- Dispatch evaluator with "chain" label and ordered skill list
- Review transition quality ratings at each handoff
- Focus on "broken" handoffs — these indicate missing artifacts or unclear instructions
Use chain eval after: modifying Inputs/Outputs, adding skills to a chain, or when workflow audit findings indicate issues.
When subagent dispatch is unavailable, two options:
| Fallback | How | Trade-off |
|---|---|---|
| Sequential inline | Follow agents/evaluator.md protocol inline, randomize order |
Slower, possible ordering bias |
| Skip A/B | Apply change directly with simple verification | Faster, no comparison data |
The user chooses which fallback to use.
A cross-cutting concern available in both project and skill optimization modes. When a user reports that a skill triggered but produced wrong results, the feedback process provides structured iteration.
| User Says | Action |
|---|---|
| "This skill triggered but produced wrong results" | Feedback iteration |
| "The steps are in the wrong order" | Feedback iteration |
| "Description format doesn't follow conventions" | Optimization targets 1-2 |
| "Token budget exceeded across the project" | Optimization target 2 (project mode) |
Before applying any feedback, each item goes through validation:
| Question | Purpose | Red Flag |
|---|---|---|
| Goal alignment: Does this serve the skill's core goal? | Prevents scope drift | "This would turn the skill into something different" |
| Necessity: Is there an actual defect, or just a style preference? | Prevents unnecessary churn | "The skill works fine, I just prefer a different format" |
| Side effects: Could this introduce complexity or regression? | Prevents creep | "This adds 50 lines to handle a rare edge case" |
Receive feedback
→ Identify target skill
→ If external skill: fork with forked- prefix
→ Read skill, understand core goal
→ Validate each item (3-question framework)
→ Present improvement plan → USER CONFIRMS
→ Copy to working version
→ Apply changes to copy
→ A/B eval (original vs optimized)
→ User decides: adopt or discard
→ Optimizing invokes auditing for post-change verification
Rules:
- Never apply feedback without user confirmation
- For external skills, always fork first (add provenance header)
- At most one verification pass after changes; if issues remain, escalate to the user — do not auto-loop. Optimizing triggers re-audit when following this skill's protocol; auditing does not auto-trigger optimizing.
| Mistake | What Goes Wrong | How to Avoid |
|---|---|---|
| Trying to optimize everything at once | Unfocused changes that are hard to verify | Pick one target, measure, improve, verify — then move to the next |
| Rewriting descriptions as workflow summaries | Agent shortcuts the description instead of reading the full SKILL.md | State triggering conditions ("Use when reviewing..."), not steps ("Scans structure, checks manifests...") |
| Ignoring the bootstrap skill's token budget | The bootstrap skill loads every session, so bloat costs context everywhere | Keep using-* under 200 lines — this is the highest-ROI token optimization |
| Applying user feedback without validation | Style preferences masquerade as defect reports, leading to unnecessary churn | Run every feedback item through the 3-question validation framework before accepting |
| Expanding a skill's scope during any optimization | A skill slowly drifts from its original responsibility | Optimization should improve how well a skill fulfills its goal, not shift what the goal is. Verify after every change: does this skill still do the same thing? |
| Running all 7 targets on a single skill | Targets 4-7 require project context and produce no useful results at skill scope | Let scope auto-detection handle it — single skills only get targets 1-3 |
| Rewriting entire SKILL.md instead of surgical edits | Large diffs increase regression risk and make review harder | Specify section-level changes. A FIX to one heading should not trigger a full rewrite — minimize diff surface |
| Adding third-party skills without security audit | Imported content may contain encoded prompts, excessive tool access, or network calls | Always run bundles-forge:auditing on imported skills — see references/third-party-integration.md |
| Adding skills without updating Integration sections | The workflow graph becomes inconsistent, causing W10 (asymmetric integration) findings | Every new skill connection needs symmetric **Calls:** and **Called by:** declarations |
| Skipping A/B eval for description rewrites | A description that improves one trigger may break another | Always A/B eval when modifying existing trigger phrases — additive-only changes can skip |
Q: What's the difference between auditing and optimizing?
Auditing is pure diagnostics — it checks, scores, and reports. It never modifies files or calls optimizing. Optimizing is the improvement driver — it reads audit reports (or your goals), diagnoses what to fix, delegates content changes to authoring, and verifies results by calling auditing. Think of it as: auditing tells you what's wrong, optimizing fixes it.
Q: When should I use optimizing vs authoring directly?
Use optimizing when you need diagnosis — when you don't know exactly what to fix, or you want a structured improvement process with A/B evaluation and verification. Use authoring directly when you already know exactly what content to write or change (e.g., "rewrite this description to X").
Q: Do I need to run an audit before optimizing?
No, but it's recommended. Optimizing can run its own diagnosis, but feeding it an audit report gives it a prioritized list of findings to work through. The common pattern is: audit → review report → optimize based on findings.
Q: Which targets apply when optimizing a single skill?
Targets 1-3 (description triggering, content optimization, workflow chain integrity) plus feedback iteration. Targets 4-7 are skipped because they require project-wide context. Within Target 3, only W9 (placeholder sections) and W10 (asymmetric integration) apply at skill scope.
Q: What if the verification audit still shows issues after optimization?
The agent presents remaining issues to you for a manual decision — it does not loop automatically. You can choose to run another optimizing pass, invoke authoring directly for specific fixes, or accept the current state. This prevents infinite optimize-audit cycles.
Q: Can I optimize a project hosted on GitHub without cloning it first?
Yes. Pass a GitHub URL directly — the skill performs a shallow clone automatically. This also works for subdirectory URLs (e.g., github.com/user/repo/tree/main/skills/my-skill) and archive URLs (.zip/.tar.gz).
bundles-forge audit-skill <path> # Quality lint (Q1-Q15, X1-X3)
bundles-forge audit-skill <skill-dir> # Single skill audit (4 categories)
bundles-forge audit-workflow <path> # Workflow audit (W1-W9, script-automated)
bundles-forge audit-workflow --focus-skills a,b <path> # Focused workflow audit
bundles-forge audit-security <path> # Security scan (7 surfaces)W10-W11 (chain evaluation and behavioral verification) require evaluator agent dispatch and are not produced by the script.
| Target | Project | Skill |
|---|---|---|
| 1. Description Triggering | Full | Full |
| 2. Content Optimization | Full | Full |
| 3. Workflow Chain Integrity | Full | Partial (W9/W10 only) |
| 4. Security Remediation | Full | Partial |
| 5. Skill & Workflow Restructuring | Full | Skip |
| 6. Optional Component Management | Full | Skip |
| 7. Deprecation and Migration | Full | Skip |
| Feedback Iteration | Full | Full |