Skip to content

[043][Phase 5][US3] Prompt override surface + version discipline #754

@jwesleye

Description

@jwesleye

Scope

US3 MVP — declarative prompt configuration: every judge-backed evaluator accepts custom template, few-shot examples, system prompt, output schema, use-reasoning flag, and feedback key without code change. Prompt version is always recorded in results.

Priority: P1 (MVP)

Tasks

  • T104 [P] [US3] Tests in eval/tests/us3_custom_prompt_test.rs.with_prompt(custom) replaces built-in, few-shot injection order, version bump is explicit opt-in, missing variable is construction-time error
  • T105 [US3] Extend every judge-backed evaluator builder with .with_prompt(), .with_few_shot(), .with_system_prompt(), .with_output_schema(), .with_use_reasoning(), .with_feedback_key() — all route through JudgeEvaluatorConfig
  • T106 [US3] Verify prompt_version recorded in every EvalMetricResult::details (re-verification of T056 in context of all evaluators)
  • T107 [US3] Versioning smoke test — correctness_v0 (built-in) and correctness_v1 (custom) both resolvable; results distinguish them per-metric

Acceptance

  • A consumer can override the built-in CorrectnessEvaluator prompt with a custom template + few-shot examples via builder methods only; no evaluator code is modified.
  • The custom template's version string appears in the resulting EvalMetricResult::details.
  • Bumping a prompt version (_v0_v1) is explicit; old version remains accessible.
  • Missing or misspelled template variables surface at evaluator construction time.

References

  • Spec FR-010, FR-011, Success criterion SC-005

Depends on

#750 (evaluators must exist to override).

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestevalin-progressAutomated agent is working on thisspecSpec-driven implementation task

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions