Optimizing Guide

中文

A user-oriented guide to optimizing bundle-plugins and individual skills with Bundles Forge. Covers quick start, scope detection, the diagnose → delegate → verify pipeline, 7 optimization targets, A/B evaluation, and feedback iteration.

Overview

Optimizing is the orchestrator for iterative improvement in the hub-and-spoke model. It diagnoses what needs to change, delegates substantive content edits to bundles-forge:authoring, and verifies outcomes by invoking bundles-forge:auditing (one-way: auditing reports only; it does not call back into optimizing).

Unlike auditing (which assesses only), optimizing drives improvement from the hub — fixing descriptions, reducing tokens, tightening workflow chains, and processing user feedback — routing SKILL.md and agent content work to authoring where appropriate.

Core principle: Optimize for the agent's experience. Every improvement should make skills easier to discover, faster to load, and clearer to follow.

Canonical source: The full execution protocol (scope detection, target details, A/B eval steps, feedback validation) lives in skills/optimizing/SKILL.md. This guide helps you understand what each target does, how to interpret results, and when to use which approach.

Quick Start

The fastest way to improve an existing bundle-plugin or skill:

Run an audit (optional but recommended) — bundles-forge:auditing produces a diagnostic report that optimizing can consume directly
Invoke bundles-forge:optimizing (or ask the agent to optimize your project or skill)
The agent auto-detects scope — project root or single skill — and selects applicable targets
Review proposed changes — the agent presents its improvement plan before applying
Verify — the agent runs bundles-forge:auditing after changes to confirm improvement

Optimizing accepts local paths, GitHub URLs, and zip/tar.gz files as input. You don't need to clone a repo first.

Choosing Your Path

Optimizing auto-detects your scope from the path you provide:

Scope	Detection	Mode	Applicable Targets
Project root	Has `skills/` + `package.json`	Project optimization	All 7 targets + feedback
Single skill directory	Has `SKILL.md`, no `skills/` subdirectory	Skill optimization	Core targets (1-3) + feedback

Decision Flowchart

Are you optimizing an entire project or a single skill?
  ├─ Entire project → Project optimization (all 7 targets)
  │    ├─ Have an audit report? → Feed it as input for prioritized optimization
  │    └─ No audit report? → Optimizing runs its own diagnosis
  └─ Single skill → Skill optimization (targets 1-3 + feedback)
       ├─ Skill triggered but produced wrong results? → Feedback iteration
       └─ Skill needs engineering improvement? → Targets 1-3

Supported Inputs

Input	Action
Local directory path	Use directly
GitHub repo URL	Shallow clone to temp directory
GitHub subdirectory URL	Clone repo, extract subdirectory
Zip/tar.gz file path	Extract to temp directory

Remote sources (GitHub URLs, archives) are cloned or downloaded without executing hooks or scripts — the skill scans for risks before running any project code.

If download fails, the skill reports the error and suggests alternatives (provide a local path or zip file).

Input Sources

Optimizing can consume reports from prior audits. A common pattern is audit first, then optimize based on findings — but remember that bundles-forge:auditing is diagnostic only: it does not invoke optimizing. You run optimizing when you want fixes, or another orchestrator (for example bundles-forge:releasing) sequences auditing and optimizing in its pipeline.

Input	Source	Use
`audit-report`	`bundles-forge:auditing` (full project)	Per-skill breakdowns for all 7 targets
`skill-report`	`bundles-forge:auditing` (skill mode)	Focused 4-category report for skill optimization
`workflow-report`	`bundles-forge:auditing` (workflow mode)	W1-W9 findings for Target 3
`user-feedback`	Direct from user	Behavioral feedback for the iteration process

The Pipeline: Diagnose → Delegate → Verify

Optimizing follows a one-way verification pattern toward auditing, not a mutual skill cycle. Auditing never invokes optimizing; optimizing calls auditing when it needs a verification report after changes.

optimizing diagnoses → delegates content edits to authoring → verifies via auditing

Important: If verification still shows issues, present them to the user for a manual decision — do not loop indefinitely. Further passes are a new user- or orchestrator-driven run of optimizing (or authoring), not auditing "calling back" into optimizing.

Recommended Workflow

Optionally run bundles-forge:auditing (full, skill, or workflow mode) to produce a diagnostic report as input.
Review the report — prioritize critical findings.
Run bundles-forge:optimizing with the audit report (or your goals). It diagnoses, delegates content work to bundles-forge:authoring as needed, applies non-content optimizations per its protocol, and invokes bundles-forge:auditing for post-change verification (optimizing triggers this step — auditing does not auto-start optimizing).
Review the verification report — if issues remain, decide manually whether to run another optimizing or authoring pass.

Target Routing

Select targets based on audit findings or user request — don't run all 7 sequentially:

Finding / Signal	Target
Q-findings (description anti-patterns, frontmatter issues)	Target 1, 2
W-findings (workflow integrity issues)	Target 3
Platform gaps identified	Invoke `bundles-forge:scaffolding` directly
Security findings (SC/AG checks)	Target 4
User requests adding/replacing/reorganizing skills	Target 5
Component signals (userConfig, MCP, LSP needs)	Target 6
Deprecated skills, renamed skills, platform removal	Target 7
User behavioral feedback about skill quality	Feedback Iteration

Optimization Action Classification

After routing to a target, the agent classifies the optimization action. This determines whether to patch an existing skill or create something new:

Type	When	What Happens
FIX	Skill has a defect — outdated instructions, broken references, ineffective steps	Repair in place. The skill's core goal and scope do not change
DERIVED	Skill works but needs enhancement or specialization for a new context	A variant or improved version is created. The original remains available
CAPTURED	A workflow gap exists — no skill covers a needed capability	A new skill is created from scratch to fill the gap

The agent explicitly states its classification and rationale before making changes. When a change touches ## Outputs or ## Integration, the agent maps all downstream skills that consume those artifacts and includes their updates in the same pass — preventing breakage discovered only at verification time.

Skill Health Assessment

Beyond automated linter checks (bundles-forge audit-skill), the agent assesses each skill across four qualitative dimensions:

Dimension	What It Tells You
Trigger confidence	Can realistic user prompts correctly trigger this skill? Low confidence points to Target 1
Execution clarity	Once triggered, can an agent follow the steps without ambiguity? Vague instructions or implicit assumptions indicate a FIX is needed
End-to-end completeness	Does the full flow from trigger to output have gaps? Missing handoffs or undefined artifacts point to Target 3 or a CAPTURED action
Degradation signals	Has this skill stopped working in practice? Recurring audit findings or user reports of wrong output signal an urgent FIX

When assessment or audit findings reveal structural gaps — not just broken connections but missing capabilities — the agent considers whether a new skill should be created (CAPTURED) rather than patching existing ones. Common gap signals include W2 (unreachable skill), dead zones in the workflow graph, and repeated manual work that no existing skill covers.

Core Targets (1-3)

These targets apply to both project optimization and single-skill optimization.

Target 1: Skill Description Triggering

The highest-impact optimization. Descriptions are the primary mechanism for skill discovery — when a user says something, the agent matches intent against description fields.

The critical rule: Descriptions must state triggering conditions, not workflow summaries.

	Bad (workflow summary)	Good (triggering conditions)
Pattern	"Use for auditing — scans structure, checks manifests, scores categories, generates report"	"Use when reviewing a bundle-plugin for structural issues, version drift, or before release"
Problem	Agent follows the description shortcut instead of reading the full SKILL.md	Agent reads the full skill for execution details
Result	Skipped steps, incomplete execution	Full execution as designed

Additional rules:

Always start with "Use when..."
Keep under 250 characters (truncated in skill listings beyond this)
Include concrete symptoms, situations, and contexts
Never mention the skill's internal steps

How to Verify Description Quality

Run the linter to catch mechanical issues:

bundles-forge audit-skill <target-dir>

Description-specific checks are Q3-Q7: missing description (Q3), "Use when..." prefix (Q5), workflow summary anti-pattern (Q6), and length >250 characters (Q7). The full lint suite covers Q1-Q15 and X1-X3 — see Quick Reference for the complete list.

For behavioral quality (does the right prompt trigger the right skill?), use A/B eval.

Target 2: Content Optimization

Covers both token efficiency and layer assignment — reducing what agents load and ensuring content lives at the right level.

Token Budget

Every token in a frequently-loaded skill costs context budget across every session. This matters most for the bootstrap skill (loaded every session) and commonly-triggered skills.

Targets:

SKILL.md body < 500 lines
Bootstrap skill (using-*) < 200 lines
Move heavy reference material to references/

Techniques:

Technique	Example
Cross-reference instead of repeating	`See bundles-forge:authoring` instead of duplicating rules
One excellent example over three mediocre	Remove redundant examples that teach the same concept
Move flag docs to --help	Reference `bundles-forge audit-skill --help` instead of listing all flags
Eliminate intra-project redundancy	Don't repeat what's in another skill's `references/`

Layer Assignment

The three-level loading system ensures minimal context usage:

Level	When Loaded	Budget
Metadata (name + description)	Always in context	~100 words
SKILL.md body	When skill triggers	< 500 lines
Reference files (`references/`)	On demand	Unlimited

When to extract to references/:

SKILL.md approaching 500 lines
Tables or checklists that are only needed during execution (not for understanding the skill's purpose)
Template content that the agent copies verbatim

Target 3: Workflow Chain Integrity

Consumes workflow audit findings to identify and fix workflow issues. The workflow audit has two layers:

Script-automated (W1-W9): Static graph analysis and semantic checks — run via bundles-forge audit-workflow
Evaluator-only (W10-W11): Chain evaluation and behavioral verification — requires evaluator agent dispatch

If no workflow report is available:

bundles-forge audit-workflow <target-dir>
bundles-forge audit-workflow --focus-skills skill-a,skill-b <root>

Fix priority guide:

Finding	What It Means	How to Fix
W1 (undeclared cycle)	Two skills call each other but the loop isn't declared	Add `<!-- cycle:a,b -->` in `## Integration` if intentional, or restructure
W2 (unreachable skill)	Skill exists but nothing chains to it	Add to bootstrap routing, or declare `Called by: user directly`
W3/W4 (missing I/O)	Terminal skill has no `## Outputs`, or referenced skill has no `## Inputs`	Add the section with artifact IDs
W5 (artifact ID mismatch)	Upstream `## Outputs` and downstream `## Inputs` use different names	Align the backtick artifact IDs
W9 (placeholder sections)	Inputs/Outputs exist but are empty or generic	Write meaningful semantic descriptions
W10 (asymmetric integration)	Skill A says it calls B, but B doesn't say it's called by A	Add the missing `Called by:` declaration

In single-skill mode, only W9 (placeholder sections) and W10 (asymmetric integration) apply — the rest require project-wide graph analysis.

Project-Only Targets (4-7)

These targets are skipped in single-skill mode. They require project-wide context.

Target 4: Security Remediation

Fix security findings from bundles-forge:auditing Category 10. Common fixes:

Finding	Fix
Hook script makes network calls	Remove or justify with comments
OpenCode plugin has excessive capabilities	Scope to declared needs
Agent prompt lacks scope constraints	Add explicit boundaries
SKILL.md contains encoded/obfuscated content	Strip or replace with plain text

Target 5: Skill & Workflow Restructuring

Structural changes to the project: adding skills, replacing skills, reorganizing workflow chains, or converting skills to subagents. This was previously part of blueprinting (Scenario D) but belongs in optimizing because it operates on existing projects without producing a design document.

When to Use

User Says	Action
"Add a new skill to my project"	Target 5a — add skill, wire into workflow
"Replace this skill with a better one"	Target 5b — replace and update references
"The workflow chain needs reorganizing"	Target 5c — restructure execution paths
"This skill should be a subagent instead"	Target 5d — convert to read-only agent
"My project needs better X capability"	Feedback process → may lead to Target 5a

Adding Skills (5a)

The most common restructuring operation. The process:

Read the existing project — map skills, workflow graph, bootstrap routing
Inventory new skills — source, structure, frontmatter quality
Check compatibility against the existing project (naming, responsibilities, conventions)
For third-party skills — follow the shared integration reference (references/third-party-integration.md) covering license, security audit, and integration intent
Design insertion points — where do new skills connect?
Apply — copy, adapt, update Integration sections
Verify — focused workflow audit with --focus-skills

Replacing Skills (5b)

Same compatibility analysis as adding, plus:

Map all references to the old skill
Update cross-references, Integration sections, and routing table
Verify with workflow audit

Reorganizing Workflows (5c)

When the execution chain has inefficiencies:

Map the current graph and identify bottlenecks or unnecessary handoffs
Propose new chain (present to user)
Update Integration sections and routing
Verify with Chain A/B Eval

Skill-to-Agent Conversion (5d)

Candidates for conversion:

Execution produces verbose temporary context (search results, file contents, logs) that subsequent steps don't need
Skills that only inspect/validate without modifying files
Skills that produce structured reports (self-contained output)
Skills that could run in parallel with other work (optional bonus)

Conversion extracts the execution protocol into agents/<role>.md with fallback logic for when subagents are unavailable. After conversion, dispatch the evaluator agent with test prompts to confirm the new agent correctly executes the former skill's responsibilities, then run bundles-forge:auditing to verify dispatch/fallback logic.

Target 6: Optional Component Management

Add, adjust, or migrate optional plugin components based on evolving project needs. This target handles the gap between initial scaffolding and the components a project needs as it matures.

When to Use

Signal	Component	Action
Skills hardcode API keys/endpoints as `${VAR}` env vars	`userConfig`	Migrate to `userConfig` for automatic user prompting
Audit finds MCP servers without `userConfig`-backed auth	`userConfig`	Add `userConfig` fields with `sensitive: true`
Skills reference external SaaS APIs with no integration	`.mcp.json` or `bin/`	Add MCP server or CLI — consult decision tree
Skills involve language-specific code intelligence	`.lsp.json`	Add LSP server config
Users request custom output formats	`output-styles/`	Add output style definitions
Plugin MCP server has npm dependencies	`${CLAUDE_PLUGIN_DATA}`	Add SessionStart dependency install hook
Plugin uses `../` paths or writes to `${CLAUDE_PLUGIN_ROOT}`	Path migration	Fix to use relative `./` paths and `${CLAUDE_PLUGIN_DATA}`

How It Works

Diagnose — identify signals from audit reports, user feedback, or direct inspection
Decide — consult skills/scaffolding/references/external-integration.md for the full decision tree (CLI vs MCP, userConfig schema, PLUGIN_DATA patterns, LSP fields, output-styles format)
Execute — invoke bundles-forge:scaffolding using its "Adding Optional Components" flow
Verify — run bundles-forge:auditing to confirm structural integrity and security compliance (especially for new MCP servers and userConfig sensitive values)

Target 7: Deprecation and Migration

Coordinate the deprecation, renaming, splitting, or merging of skills. This target ensures all references remain consistent across the project during structural changes.

Deprecation — mark a skill as deprecated without removing it:

Add deprecated: true and superseded-by: <project>:<replacement> to the skill's frontmatter
Prepend the description with a deprecation notice: "Use when... (deprecated — use <replacement> instead)"
Update the bootstrap routing table to note the deprecation
Update cross-references in other skills' ## Integration sections

Renaming — change a skill's name while preserving all connections:

Rename the directory: skills/old-name/ → skills/new-name/
Update frontmatter name field
Update all cross-references (<project>:old-name → <project>:new-name) across all SKILL.md, Integration sections, and documentation
Update bootstrap routing table
Run bundles-forge audit-docs to catch any missed references

Splitting — divide a skill into multiple focused skills:

Design the new skill boundaries (reuse bundles-forge:blueprinting scenario B)
Invoke bundles-forge:scaffolding for new skill directories
Invoke bundles-forge:authoring to write each new skill's content
Update all references to the original skill
Deprecate the original (or remove if all functionality is covered)
Run bundles-forge:auditing in workflow mode to verify chain integrity

Merging — combine multiple skills into one:

Design the merged skill (reuse bundles-forge:blueprinting scenario C)
Invoke bundles-forge:authoring to write the merged content
Deprecate the source skills
Update all cross-references and routing
Run bundles-forge:auditing in workflow mode

Platform cleanup — after any structural change:

Remove deprecated skill references from platform manifests
Update .version-bump.json if manifest paths changed
Run bundles-forge:testing to verify component discovery

A/B Evaluation

A/B eval is the core quality assurance mechanism for description changes and feedback-driven improvements. It compares original vs optimized versions side-by-side.

How It Works

1. Copy the skill to a working version (<skill-name>-optimized/)
2. Apply changes to the copy only (never overwrite the original first)
3. Create 5+ realistic test prompts that should trigger this skill
4. Dispatch two evaluator agents in parallel:
   - Evaluator A: "original" label → test with original skill
   - Evaluator B: "optimized" label → test with optimized skill
5. Compare results → present to user
6. User decides: adopt optimized version or discard

What to Compare

Metric	What It Tells You
Trigger rate	How many prompts correctly activated the skill?
False negatives	Did the optimized description miss cases the original caught?
False positives	Did either version trigger on prompts meant for other skills?
Step accuracy	Did the agent follow all steps, or take shortcuts?

When to Skip A/B

Situation	Skip?	Rationale
Purely additive change (new trigger phrases, no modifications)	Yes	Simple verification pass is sufficient
Structural fix (missing section, broken reference)	Yes	Not a behavioral change
Description rewrite changing existing triggers	No	Must verify no regressions
Feedback-driven behavior change	No	Must compare old vs new behavior

Chain A/B Eval

For workflow transitions (not individual descriptions), use chain evaluation:

Define a realistic end-to-end scenario
Dispatch evaluator with "chain" label and ordered skill list
Review transition quality ratings at each handoff
Focus on "broken" handoffs — these indicate missing artifacts or unclear instructions

Use chain eval after: modifying Inputs/Outputs, adding skills to a chain, or when workflow audit findings indicate issues.

Subagent Fallback

When subagent dispatch is unavailable, two options:

Fallback	How	Trade-off
Sequential inline	Follow `agents/evaluator.md` protocol inline, randomize order	Slower, possible ordering bias
Skip A/B	Apply change directly with simple verification	Faster, no comparison data

The user chooses which fallback to use.

Feedback Iteration

A cross-cutting concern available in both project and skill optimization modes. When a user reports that a skill triggered but produced wrong results, the feedback process provides structured iteration.

Feedback Classification

User Says	Action
"This skill triggered but produced wrong results"	Feedback iteration
"The steps are in the wrong order"	Feedback iteration
"Description format doesn't follow conventions"	Optimization targets 1-2
"Token budget exceeded across the project"	Optimization target 2 (project mode)

The 3-Question Validation Framework

Before applying any feedback, each item goes through validation:

Question	Purpose	Red Flag
Goal alignment: Does this serve the skill's core goal?	Prevents scope drift	"This would turn the skill into something different"
Necessity: Is there an actual defect, or just a style preference?	Prevents unnecessary churn	"The skill works fine, I just prefer a different format"
Side effects: Could this introduce complexity or regression?	Prevents creep	"This adds 50 lines to handle a rare edge case"

Feedback Process Flow

Receive feedback
  → Identify target skill
  → If external skill: fork with forked- prefix
  → Read skill, understand core goal
  → Validate each item (3-question framework)
  → Present improvement plan → USER CONFIRMS
  → Copy to working version
  → Apply changes to copy
  → A/B eval (original vs optimized)
  → User decides: adopt or discard
  → Optimizing invokes auditing for post-change verification

Rules:

Never apply feedback without user confirmation
For external skills, always fork first (add provenance header)
At most one verification pass after changes; if issues remain, escalate to the user — do not auto-loop. Optimizing triggers re-audit when following this skill's protocol; auditing does not auto-trigger optimizing.

Common Mistakes

Mistake	What Goes Wrong	How to Avoid
Trying to optimize everything at once	Unfocused changes that are hard to verify	Pick one target, measure, improve, verify — then move to the next
Rewriting descriptions as workflow summaries	Agent shortcuts the description instead of reading the full SKILL.md	State triggering conditions ("Use when reviewing..."), not steps ("Scans structure, checks manifests...")
Ignoring the bootstrap skill's token budget	The bootstrap skill loads every session, so bloat costs context everywhere	Keep `using-*` under 200 lines — this is the highest-ROI token optimization
Applying user feedback without validation	Style preferences masquerade as defect reports, leading to unnecessary churn	Run every feedback item through the 3-question validation framework before accepting
Expanding a skill's scope during any optimization	A skill slowly drifts from its original responsibility	Optimization should improve how well a skill fulfills its goal, not shift what the goal is. Verify after every change: does this skill still do the same thing?
Running all 7 targets on a single skill	Targets 4-7 require project context and produce no useful results at skill scope	Let scope auto-detection handle it — single skills only get targets 1-3
Rewriting entire SKILL.md instead of surgical edits	Large diffs increase regression risk and make review harder	Specify section-level changes. A FIX to one heading should not trigger a full rewrite — minimize diff surface
Adding third-party skills without security audit	Imported content may contain encoded prompts, excessive tool access, or network calls	Always run `bundles-forge:auditing` on imported skills — see `references/third-party-integration.md`
Adding skills without updating Integration sections	The workflow graph becomes inconsistent, causing W10 (asymmetric integration) findings	Every new skill connection needs symmetric `Calls:` and `Called by:` declarations
Skipping A/B eval for description rewrites	A description that improves one trigger may break another	Always A/B eval when modifying existing trigger phrases — additive-only changes can skip

FAQ

Q: What's the difference between auditing and optimizing?

Auditing is pure diagnostics — it checks, scores, and reports. It never modifies files or calls optimizing. Optimizing is the improvement driver — it reads audit reports (or your goals), diagnoses what to fix, delegates content changes to authoring, and verifies results by calling auditing. Think of it as: auditing tells you what's wrong, optimizing fixes it.

Q: When should I use optimizing vs authoring directly?

Use optimizing when you need diagnosis — when you don't know exactly what to fix, or you want a structured improvement process with A/B evaluation and verification. Use authoring directly when you already know exactly what content to write or change (e.g., "rewrite this description to X").

Q: Do I need to run an audit before optimizing?

No, but it's recommended. Optimizing can run its own diagnosis, but feeding it an audit report gives it a prioritized list of findings to work through. The common pattern is: audit → review report → optimize based on findings.

Q: Which targets apply when optimizing a single skill?

Targets 1-3 (description triggering, content optimization, workflow chain integrity) plus feedback iteration. Targets 4-7 are skipped because they require project-wide context. Within Target 3, only W9 (placeholder sections) and W10 (asymmetric integration) apply at skill scope.

Q: What if the verification audit still shows issues after optimization?

The agent presents remaining issues to you for a manual decision — it does not loop automatically. You can choose to run another optimizing pass, invoke authoring directly for specific fixes, or accept the current state. This prevents infinite optimize-audit cycles.

Q: Can I optimize a project hosted on GitHub without cloning it first?

Yes. Pass a GitHub URL directly — the skill performs a shallow clone automatically. This also works for subdirectory URLs (e.g., github.com/user/repo/tree/main/skills/my-skill) and archive URLs (.zip/.tar.gz).

Quick Reference

Scripts

bundles-forge audit-skill <path>                        # Quality lint (Q1-Q15, X1-X3)
bundles-forge audit-skill <skill-dir>                   # Single skill audit (4 categories)
bundles-forge audit-workflow <path>                      # Workflow audit (W1-W9, script-automated)
bundles-forge audit-workflow --focus-skills a,b <path>   # Focused workflow audit
bundles-forge audit-security <path>                       # Security scan (7 surfaces)

W10-W11 (chain evaluation and behavioral verification) require evaluator agent dispatch and are not produced by the script.

Target Applicability by Scope

Target	Project	Skill
1. Description Triggering	Full	Full
2. Content Optimization	Full	Full
3. Workflow Chain Integrity	Full	Partial (W9/W10 only)
4. Security Remediation	Full	Partial
5. Skill & Workflow Restructuring	Full	Skip
6. Optional Component Management	Full	Skip
7. Deprecation and Migration	Full	Skip
Feedback Iteration	Full	Full

FilesExpand file tree

optimizing-guide.md

Latest commit

History

optimizing-guide.md

File metadata and controls

Optimizing Guide

Overview

Quick Start

Choosing Your Path

Decision Flowchart

Supported Inputs

Input Sources

The Pipeline: Diagnose → Delegate → Verify

Recommended Workflow

Target Routing

Optimization Action Classification

Skill Health Assessment

Core Targets (1-3)

Target 1: Skill Description Triggering

How to Verify Description Quality

Target 2: Content Optimization

Token Budget

Layer Assignment

Target 3: Workflow Chain Integrity

Project-Only Targets (4-7)

Target 4: Security Remediation

Target 5: Skill & Workflow Restructuring

When to Use

Adding Skills (5a)

Replacing Skills (5b)

Reorganizing Workflows (5c)

Skill-to-Agent Conversion (5d)

Target 6: Optional Component Management

When to Use

How It Works

Target 7: Deprecation and Migration

A/B Evaluation

How It Works

What to Compare

When to Skip A/B

Chain A/B Eval

Subagent Fallback

Feedback Iteration

Feedback Classification

The 3-Question Validation Framework

Feedback Process Flow

Common Mistakes

FAQ

Quick Reference

Scripts

Target Applicability by Scope