Skip to content

feat: Eval & Grader Registry — design doc #13

@spboyer

Description

@spboyer

Migrated from spboyer/waza#385

Summary

Design a shared eval and grader registry for waza — inspired by OpenAI Evals' 800+ community registry but adapted for agent evaluation.

Context

From docs/research/waza-vs-openai-evals.md, the registry gap is waza's #1 competitive disadvantage (Row 10). This epic covers the full design.

Sub-issues

  • #386 — Map OpenAI Evals format to waza graders
  • #387 — Design Go-module-style grader/eval references
  • #388 — Evaluate registry backend options (Git/OCI/Releases/federated)
  • #389 — Design composable eval construction
  • #390 — Grader plugin extensibility design (WASM/external)

Peter's Ideas (verbatim)

  • The registry of shared evals is interesting. Graders particularly.
  • OpenAI's are all in their repo as YAML files
  • Consuming their format could be interesting
  • Go module style: just point to a repo and that is your grader or eval
  • Being able to construct your eval from a set of known graders is interesting

Deliverable

Design document at docs/research/waza-eval-registry-design.md — design only, no implementation.

Non-goals (for now)

  • Implementation — this is design research only
  • NOT a single JSON file for the registry — needs to be more robust
  • Not building the actual CLI commands yet

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions