What to build
Replace the stub edit from slice 7 with a real Agent loop. External behavior is unchanged ("assign Issue → PR appears"), but internally the orchestrator now drives a model with tool use to read code, edit files, run tests, and produce the PR.
This is HITL because prompt strategy, tool interface, and eval setup are product-defining and need design care. ADR-0006 fixes Claude as the default; this slice is where that comes alive.
ModelClient interface lands here as a deep module: a single surface for model calls; routing/retry/BYOK lookup live behind it. No model SDK imports outside this module.
- Agent tools at MVP (capability boundary code-only per ADR-0002):
read_file, list_files, write_file, run_shell (constrained to git, package manager, test runner), final_answer (signals readiness to open the PR).
- Prompt assembly: system prompt explains the Agent is a contributor on a Forge Repository; user prompt is the Issue title + body + repo tree summary.
- Token accounting per Run, persisted on the Run row.
- A small offline eval harness: a fixed set of toy Issues with expected outcomes (test passes, file edited, etc.) — used in CI to catch obvious regressions in prompt or tool changes.
Decisions to nail in this slice
- Tool schema and boundaries (what
run_shell allows).
- Failure-recovery policy (model retries vs. Run-level retries).
- Eval set composition and how a regression blocks merges.
Acceptance criteria
Blocked by
What to build
Replace the stub edit from slice 7 with a real Agent loop. External behavior is unchanged ("assign Issue → PR appears"), but internally the orchestrator now drives a model with tool use to read code, edit files, run tests, and produce the PR.
This is HITL because prompt strategy, tool interface, and eval setup are product-defining and need design care. ADR-0006 fixes Claude as the default; this slice is where that comes alive.
ModelClientinterface lands here as a deep module: a single surface for model calls; routing/retry/BYOK lookup live behind it. No model SDK imports outside this module.read_file,list_files,write_file,run_shell(constrained togit, package manager, test runner),final_answer(signals readiness to open the PR).Decisions to nail in this slice
run_shellallows).Acceptance criteria
lib/foo.py") produces a PR that actually fixes the test on at least the eval set.ModelClient; lint/CI rejects direct SDK imports outside that module.ModelClientis used by tests covering Agent loop control flow.Blocked by