Phase 1 · SAO feedback sensors — output quality evaluation beyond exit code

## Meta

```yaml
type: DesignDecision
stage: draft
maturity: L1
created: 2026-05-10
inputs:
  - "Luis85/specorator specs/specorator-agent-orchestrator/design.md — success criteria"
  - "OpenAI harness engineering — feedback controls as sensors"
  - "Martin Fowler — computational vs. inferential sensors"
related: ["#43", "#44", "#46", "#21"]
```

> **Purpose.** Design the feedback harness layer that validates agent output quality after execution — the "sensors" that prevent low-quality work from advancing the workflow stage.

---

## Context

Harness engineering distinguishes:

- **Computational sensors:** deterministic, fast (milliseconds–seconds), CPU-based. Include exit codes, file presence, schema checks, structural validation. Highly reliable.
- **Inferential sensors:** semantic analysis via LLMs / AI judges. Slower, non-deterministic, but enable rich quality judgments.

The SAO design doc currently defines success as: `exit code 0 AND stage artifact present`.

This is a minimal structural check (two computational sensors). Research finding from OpenAI's harness engineering post: **agents are systematically bad at evaluating their own output, especially as context fills.** External evaluation is required for production-grade harness reliability.

---

## Sensor hierarchy

| Level | Sensor | Type | Failure action |
|---|---|---|---|
| L1 | Exit code 0 | Computational | Retry with backoff |
| L2 | Stage artifact present | Computational | Retry with backoff |
| L3 | Artifact schema valid (required sections, notation) | Computational | Retry with backoff |
| L4 | Context-window guard (min content thresholds, sentinel sections) | Computational | Retry or `CONTEXT_EXHAUSTION` (→ #46) |
| L5 | LLM judge quality evaluation | Inferential | Retry or human review |
| L6 | Human review gate | Manual | Hold in `pending-review` until approved |

L1–L4 are in scope for V1. L5–L6 are design decisions for this issue.

---

## Sensor integration with state machine

```
AgentRunner exits
    └─ L1 check (exit code)      ──fail──→ retry-queued
        └─ L2 check (artifact)   ──fail──→ retry-queued
            └─ L3 check (schema) ──fail──→ retry-queued
                └─ L4 check (guard) ─fail─→ CONTEXT_EXHAUSTION → released
                    └─ [L5 if enabled] ─fail─→ retry-queued or pending-review
                        └─ success → merge + stage advance (or L6 review gate)
```

---

## Open questions for this issue

1. **L5 in V1?** Which stages, if any, warrant LLM judge evaluation in V1 (cost and latency are real)?
2. **Quality threshold:** what constitutes "good enough" for automatic advancement via L5?
3. **L6 integration:** how does the human review gate surface in the StatusSurface and fleet dashboard (specorator#168)?
4. **Retry vs. release decision:** at what point does repeated sensor failure trigger `released` instead of `retry-queued`? (Already specified for retry count; does sensor _type_ affect this?)
5. **Sensor configurability:** should stages declare their required sensor level in the template frontmatter (→ #44)?

---

## Acceptance

- [ ] Sensor hierarchy (L1–L6) ratified with V1 vs. deferred decision per level
- [ ] L1–L4 sensor specifications written (input, check logic, failure output)
- [ ] L5 (LLM judge) decision: ship in V1 or defer with rationale
- [ ] L6 (human review gate) design: pending-review state added to taxonomy or deferred (→ #52)
- [ ] Retry vs. `released` escalation rules per sensor level specified
- [ ] Sensor failure payloads defined (what is stored in `RetryEntry.reason`?)
- [ ] Integration with state machine diagram in #43


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase 1 · SAO feedback sensors — output quality evaluation beyond exit code #45

Meta

Context

Sensor hierarchy

Sensor integration with state machine

Open questions for this issue

Acceptance

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Level	Sensor	Type	Failure action
L1	Exit code 0	Computational	Retry with backoff
L2	Stage artifact present	Computational	Retry with backoff
L3	Artifact schema valid (required sections, notation)	Computational	Retry with backoff
L4	Context-window guard (min content thresholds, sentinel sections)	Computational	Retry or `CONTEXT_EXHAUSTION` (→ #46)
L5	LLM judge quality evaluation	Inferential	Retry or human review
L6	Human review gate	Manual	Hold in `pending-review` until approved

Phase 1 · SAO feedback sensors — output quality evaluation beyond exit code #45

Description

Meta

Context

Sensor hierarchy

Sensor integration with state machine

Open questions for this issue

Acceptance

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions