Refinement Loop Pattern (Autoresearch-Inspired)

Version: 1.0 | Source: Karpathy's autoresearch, adapted for production Claude Code workflows

What This Is

An iterate-and-measure pattern for autonomous quality improvement. The agent modifies one artifact, scores it against binary criteria, keeps improvements, reverts regressions, and repeats until the score ceiling is hit or iterations are exhausted.

When to use: Any step that produces a scorable artifact. NOT for exploratory or creative work where "better" can't be measured.

The 7 Required Elements

Every refinement loop MUST define all 7 before starting iteration:

1. Goal

What "better" means, expressed as a number. Example: "keyword score >= 85%" or "checklist completeness = 6/6".

2. Scope

Exactly ONE file or section being modified. The evaluator, criteria, and all other files are READ-ONLY during the loop. This prevents scope creep and makes every iteration comparable.

3. Evaluation Protocol

Binary criteria (Y/N) are strongly preferred over subjective ratings. A checklist of 5-6 yes/no questions produces a score out of N that is deterministic and reproducible. Subjective 1-10 ratings drift across iterations.

4. Decision Rules

Keep if total score improves (even by 1 point)
Revert if score stays the same or drops
Crash (skip iteration) if the modification breaks structure or introduces errors
Plateau breaker: After 3 consecutive reverts, discard current approach and regenerate from scratch using only the criteria + failure history (not the current artifact text)

5. Logging

Each iteration logs a structured record (append-only):

Iteration | Score | Delta | Verdict | Description
1         | 3/5   | +3    | KEEP    | Added quantified metric
2         | 3/5   | 0     | REVERT  | Vocabulary swap, no score change
3         | 4/5   | +1    | KEEP    | Led with outcome number

Log is written to refinement_log.md in the relevant project folder.

6. Autonomy Constraints

Time-box: Maximum duration before presenting results (default: 10 minutes)
Iteration cap: Maximum iterations (default: 8)
Never pause to ask during the loop -- finish iterations, then present results with the log
Re-read from disk each cycle -- the file on disk is truth, not conversation memory

7. Simplicity Tiebreaker

If two versions score the same: keep the shorter/simpler one. A +0 change that adds complexity is a regression.

Safety Invariants

Non-negotiable for any refinement loop:

One file scope -- bounded blast radius
One metric -- deterministic decisions, no "feels better"
Git checkpoint -- every iteration starts from a known state
Separate generator and evaluator -- the model writing the modification must NOT score its own work
Time-boxed -- hard cap on iterations AND wall-clock time
Audit trail -- the log captures every decision for user review

What This Is NOT

NOT a new skill (it's a sub-step pattern inside existing skills)
NOT for first-draft generation (the artifact must already exist before the loop starts)
NOT for subjective quality (if you can't define binary criteria, don't use the loop)
NOT a replacement for human review (the loop improves the draft; the user still approves)

Example Application

# Pseudo-code showing the loop structure
goal = "checklist_score >= 5/6"
iteration_cap = 8
artifact_path = "output/current_draft.md"

for i in range(iteration_cap):
    draft = read(artifact_path)
    
    # Generator: Claude modifies the artifact
    modified = generate_improvement(draft, criteria, failure_history)
    
    # Evaluator: Ollama (deepseek-r1:14b) scores against binary criteria
    score = ollama_evaluate(modified, criteria)  # returns N/6
    
    if score > current_score:
        write(artifact_path, modified)  # KEEP
        log(i, score, "+1", "KEEP", "description")
    else:
        log(i, score, "0", "REVERT", "no improvement")
        consecutive_reverts += 1
    
    if consecutive_reverts >= 3:
        # Plateau breaker: regenerate from criteria only
        modified = regenerate_from_scratch(criteria, failure_history)
        consecutive_reverts = 0
    
    if score >= target:
        break

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refinement Loop Pattern (Autoresearch-Inspired)

What This Is

The 7 Required Elements

1. Goal

2. Scope

3. Evaluation Protocol

4. Decision Rules

5. Logging

6. Autonomy Constraints

7. Simplicity Tiebreaker

Safety Invariants

What This Is NOT

Example Application

FilesExpand file tree

refinement_loop.md

Latest commit

History

refinement_loop.md

File metadata and controls

Refinement Loop Pattern (Autoresearch-Inspired)

What This Is

The 7 Required Elements

1. Goal

2. Scope

3. Evaluation Protocol

4. Decision Rules

5. Logging

6. Autonomy Constraints

7. Simplicity Tiebreaker

Safety Invariants

What This Is NOT

Example Application