Proposal: surgical regeneration as a third retry strategy

# Proposal: surgical regeneration in retry chain

## Problem

When a contract validation fails on **part** of a structured output (e.g. 3/15 array items violate a rule), the current `retry_policy { escalate(...) }` regenerates the **entire** output. Two consequences:

1. **Cost/latency** — full prompt+completion to fix a few bad items
2. **Structural bias persists** — same prompt + same model converges to the same bias on retry. Escalating to a stronger model partially helps but is expensive and slow.

**Real-world data point** (pdf_to_quiz, ADR-016, 2026-04-28):
- ~11 Sentry events / 48h: `Quiz delivered after contract retry exhaustion`
- Two consecutive `nano@low` attempts converge to ~5.5/15 mean imbalance — bias is structural, not sampling
- `mini@low` fallback reduces to ~2.7/15 — most quizzes still ship with ≥1 imbalanced question
- Validated experiment: regenerating only failing items with `nano@low` and a different prompt → 3/3 fixed, latency 14.8s vs ~60s for `mini@low`, cost ~$0.0003

## Shape of the idea

A new attempt mode inside the existing `retry_policy { escalate(...) }`, operating on a slice of the previous output instead of regenerating the whole thing:

```ruby
retry_policy do
  escalate(
    { model: "gpt-5-nano", reasoning_effort: "low" },
    { model: "gpt-5-mini", reasoning_effort: "low" },
    {
      model: "gpt-5-nano", reasoning_effort: "low",
      mode: :surgical,
      when: ->(result) { result.failures.size <= 4 },
      target: ->(result) { result.failures.map(&:path) },
      preserve: %i[correct_answer_index correct_answer_text],
      prompt: ->(slice, invariants) { ... }
    }
  )
end
```

**This is illustrative, not a final API.** Concrete keyword names, whether `mode: :surgical` or a separate top-level method, etc. — all open. The DSL is the easy part.

## Hard requirements (must be solved before DSL has meaning)

### 1. Validators must report failure paths

Today validators return boolean/message. Surgical needs `result.failures` with structured `path` (e.g. `[:questions, 3, :options]`) so a `target` lambda has something to point at. This is a change in the validator API, not a new keyword. Backward-compatible default could be "whole output" path, but it makes surgical attempts effectively no-ops for legacy validators.

### 2. Splice & merge

Surgical output is a slice. Gem must merge it back into the previous attempt's output by path. Simplest case: arrays-of-objects keyed by index (the quiz case). General case (arbitrary JSON path, nested arrays, dict keys) is an order of magnitude more work and probably out of scope for v1.

### 3. Preservation invariants as post-merge guard

`preserve:` is effectively an auto-validator that compares pre/post values at named paths after the merge. On violation → rollback to previous attempt's output (soft-deliver, no regression vs current behavior). This needs to happen **after** merge but **before** counting the surgical attempt as success.

## Other open questions

- **Trace shape.** Surgical attempt is conceptually one entry in `result.trace[:attempts]` but its input is a slice and its output is a slice. Does the trace record the slice, the post-merge full output, or both?
- **`around_call` semantics.** Hook fires once per `run()` with post-retry `Result` (existing invariant). Surgical attempts should not change this — they're still part of the retry chain, not separate calls.
- **Eval/optimizer interaction.** `compare_models` and `optimize` currently treat each `escalate` entry as a candidate. A `mode: :surgical` entry has different cost characteristics (smaller prompt, smaller output) and shouldn't be benchmarked the same way as a full attempt.

## Why file this now

Not asking for implementation. Logging the pattern because:
- It came from real production data (not speculation)
- The cost/latency win was experimentally validated on prod logs
- "Structural bias resistant to same-model retry" is a class of problem the retry-chain abstraction currently doesn't have a clean answer for
- The hard requirements (especially failure paths in validators) touch core API and are worth thinking about before they accumulate technical debt

Reference: ADR-016 in pdf_to_quiz (`doc/adr/016-surgical-regen-imbalanced-options.md`).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: surgical regeneration as a third retry strategy #31

Proposal: surgical regeneration in retry chain

Problem

Shape of the idea

Hard requirements (must be solved before DSL has meaning)

1. Validators must report failure paths

2. Splice & merge

3. Preservation invariants as post-merge guard

Other open questions

Why file this now

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Proposal: surgical regeneration as a third retry strategy #31

Description

Proposal: surgical regeneration in retry chain

Problem

Shape of the idea

Hard requirements (must be solved before DSL has meaning)

1. Validators must report failure paths

2. Splice & merge

3. Preservation invariants as post-merge guard

Other open questions

Why file this now

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions