Problem
Harden's Step 4 says "Stop. Deliver." but doesn't define what the final output looks like. The model infers inconsistently — sometimes a detailed summary, sometimes just "Done." This makes it impossible to:
- Verify harden did thorough work vs. gave up early
- Compare audit runs across different deliverables
- Feed structured output to an eval tracking tool for performance benchmarking
Proposed Solution
Add a mandatory output template to harden's Step 4. Every audit run ends with a consistent format:
Audit complete: [N] passes to convergence
Found: [N] 🔴, [N] 🟡, [N] 🟢
Fixed: [N] 🔴, [N] 🟡 — [summary of resolution]
Remaining: [N] 🟢 (below threshold)
Key changes: [1-2 sentence summary of most impactful fixes]
Why This Matters
This is the foundation for measurable harden performance. Without structured output, you can't compare runs, track regressions when adding new features, or benchmark whether a new audit dimension actually improved quality. Enables future eval tracking companion tool.
Likely Affected Files/Modules
SKILL.md — Step 4 modification, output template definition
Acceptance Criteria
Problem
Harden's Step 4 says "Stop. Deliver." but doesn't define what the final output looks like. The model infers inconsistently — sometimes a detailed summary, sometimes just "Done." This makes it impossible to:
Proposed Solution
Add a mandatory output template to harden's Step 4. Every audit run ends with a consistent format:
Why This Matters
This is the foundation for measurable harden performance. Without structured output, you can't compare runs, track regressions when adding new features, or benchmark whether a new audit dimension actually improved quality. Enables future eval tracking companion tool.
Likely Affected Files/Modules
SKILL.md— Step 4 modification, output template definitionAcceptance Criteria