Idea
A lightweight developer-only skill that reads harden's structured convergence summary output and logs it: date, deliverable type, passes to convergence, severity counts, hits/misses. Enables tracking harden's performance over time and benchmarking whether new features improve or degrade audit quality.
Scope
- Dev tool only — not part of harden's user-facing package
- Users install harden and use it. Developers use this to measure quality.
- Explicitly scoped as separate from harden (prevents scope creep, platform incompatibility)
- Original idea notes: "only worth building after 10+ manual eval logs prove the data is useful"
Dependencies
- Blocked by: harden#6 (structured convergence summary) — needs parseable output format to exist first
Key Design Questions
- Storage format for eval logs (JSON lines? SQLite? markdown table?)
- What metrics to track beyond basic counts
- How to correlate eval results with harden SKILL.md versions
Idea
A lightweight developer-only skill that reads harden's structured convergence summary output and logs it: date, deliverable type, passes to convergence, severity counts, hits/misses. Enables tracking harden's performance over time and benchmarking whether new features improve or degrade audit quality.
Scope
Dependencies
Key Design Questions