Skip to content

Eval tracking companion skill — log structured audit output #7

@mickeylorenzini

Description

@mickeylorenzini

Idea

A lightweight developer-only skill that reads harden's structured convergence summary output and logs it: date, deliverable type, passes to convergence, severity counts, hits/misses. Enables tracking harden's performance over time and benchmarking whether new features improve or degrade audit quality.

Scope

  • Dev tool only — not part of harden's user-facing package
  • Users install harden and use it. Developers use this to measure quality.
  • Explicitly scoped as separate from harden (prevents scope creep, platform incompatibility)
  • Original idea notes: "only worth building after 10+ manual eval logs prove the data is useful"

Dependencies

  • Blocked by: harden#6 (structured convergence summary) — needs parseable output format to exist first

Key Design Questions

  • Storage format for eval logs (JSON lines? SQLite? markdown table?)
  • What metrics to track beyond basic counts
  • How to correlate eval results with harden SKILL.md versions

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions