Phase 1 · SAO context-window guard — premature-completion and exhaustion detection

## Meta

```yaml
type: DesignDecision
stage: draft
maturity: L1
created: 2026-05-10
inputs:
  - "OpenAI harness engineering (2026-02) — agents wrap up prematurely as context fills"
  - "Luis85/specorator specs/specorator-agent-orchestrator/design.md — AgentRun token consumption"
  - "#45 — feedback sensor hierarchy"
related: ["#43", "#45", "#51"]
```

> **Purpose.** Detect and recover from the known failure mode where agents wrap up tasks prematurely as their context window approaches its limit — before the stage artifact is genuinely complete.

---

## The failure mode

OpenAI's harness engineering research exposed a systematic failure: **agents wrap up tasks prematurely as the context window approaches its limit** — not because the work is done, but because they sense the constraint. Exit code 0 does not mean success.

The SAO already requires artifact presence (L2 sensor). But artifact presence alone can still be fooled: an agent that senses context pressure may generate a placeholder artifact to satisfy the structural check while producing a stub.

### Manifestation patterns

| Pattern | Description |
|---|---|
| **Graceful stub** | Agent creates the artifact file but populates it with a minimal placeholder ("I've started the requirements...") |
| **Context-cut summary** | Agent produces a document that looks valid but is truncated reasoning with missing sections |
| **False completion signal** | Agent exits 0 with artifact present but content doesn't satisfy stage criteria |
| **Hedged handoff** | Agent appends "continuing in next session..." — a strong signal of context exhaustion |

---

## Detection approach (L4 sensor — see #45)

### Structural checks (fast, deterministic)

```ts
interface ContextWindowGuardConfig {
  minLines: number;                  // from template frontmatter (→ #44)
  requiredSections: string[];        // from template frontmatter
  forbiddenPhrases?: string[];       // e.g. "continuing in next session", "to be continued"
  tokenUsageWarningThreshold?: number;  // fraction of model limit, e.g. 0.85
}
```

1. **Minimum line count:** artifact must meet `minLines` from the template's `successCriteria`
2. **Required sections:** all `requiredSections` must be present (heading-level check)
3. **Forbidden phrases:** detect hedged-handoff language that signals premature wrap-up
4. **Token usage monitoring:** if total tokens consumed exceeds `tokenUsageWarningThreshold × modelLimit`, flag regardless of structural checks

### Failure behaviour

| Trigger | Action |
|---|---|
| Structural check fails, retries remain | `retry-queued` with `CONTEXT_GUARD_FAIL` reason |
| Structural check fails, retries exhausted | `released` with `CONTEXT_EXHAUSTION` reason code |
| Token threshold exceeded (warning) | Log warning; surface in StatusSurface; proceed to other sensors |
| Token threshold exceeded + structural fail | Skip further retries; `released` with `CONTEXT_EXHAUSTION` |

---

## Token usage surface

The SAO design doc already notes that `AgentRun` captures token consumption. This issue requires:

- `AgentRun.tokenUsage: { input: number; output: number; total: number }` — populated from `--output-format stream-json`
- `AgentRun.contextExhaustionRisk: boolean` — set when `total > warningThreshold × modelLimit`
- Fleet dashboard (specorator#168) surfaces token risk indicator per active run

---

## Open questions

1. Should `forbiddenPhrases` be global defaults or stage-configurable?
2. Does detecting context exhaustion warrant a distinct `agent_run.context_exhausted` event (→ #21)?
3. How does the retry prompt for a `CONTEXT_GUARD_FAIL` run differ from a plain failure? Should the retry template include a context summary of what the previous attempt produced?

---

## Acceptance

- [ ] L4 guard mechanism ratified (checks, config schema, failure taxonomy)
- [ ] `CONTEXT_EXHAUSTION` reason code added to `released` state taxonomy
- [ ] `AgentRun.tokenUsage` and `contextExhaustionRisk` fields specified
- [ ] Fleet dashboard token risk indicator requirement documented (→ specorator#168)
- [ ] Retry prompt strategy for `CONTEXT_GUARD_FAIL` decided (standard retry vs. context-summary retry)
- [ ] Integration with sensor hierarchy (#45) confirmed — L4 sits between L3 and L5
- [ ] Integration with template frontmatter (#44) confirmed — guard config sourced from template


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Phase 1 · SAO context-window guard — premature-completion and exhaustion detection #46

Meta

The failure mode

Manifestation patterns

Detection approach (L4 sensor — see #45)

Structural checks (fast, deterministic)

Failure behaviour

Token usage surface

Open questions

Acceptance

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Pattern	Description
Graceful stub	Agent creates the artifact file but populates it with a minimal placeholder ("I've started the requirements...")
Context-cut summary	Agent produces a document that looks valid but is truncated reasoning with missing sections
False completion signal	Agent exits 0 with artifact present but content doesn't satisfy stage criteria
Hedged handoff	Agent appends "continuing in next session..." — a strong signal of context exhaustion

Trigger	Action
Structural check fails, retries remain	`retry-queued` with `CONTEXT_GUARD_FAIL` reason
Structural check fails, retries exhausted	`released` with `CONTEXT_EXHAUSTION` reason code
Token threshold exceeded (warning)	Log warning; surface in StatusSurface; proceed to other sensors
Token threshold exceeded + structural fail	Skip further retries; `released` with `CONTEXT_EXHAUSTION`

Phase 1 · SAO context-window guard — premature-completion and exhaustion detection #46

Description

Meta

The failure mode

Manifestation patterns

Detection approach (L4 sensor — see #45)

Structural checks (fast, deterministic)

Failure behaviour

Token usage surface

Open questions

Acceptance

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions