Slice 8 — Real Agent loop: ModelClient + tool use replaces the fake PR

## What to build

Replace the stub edit from slice 7 with a real Agent loop. External behavior is unchanged ("assign Issue → PR appears"), but internally the orchestrator now drives a model with tool use to read code, edit files, run tests, and produce the PR.

This is HITL because prompt strategy, tool interface, and eval setup are product-defining and need design care. ADR-0006 fixes Claude as the default; this slice is where that comes alive.

- `ModelClient` interface lands here as a deep module: a single surface for model calls; routing/retry/BYOK lookup live behind it. **No model SDK imports outside this module.**
- Agent tools at MVP (capability boundary code-only per ADR-0002): `read_file`, `list_files`, `write_file`, `run_shell` (constrained to `git`, package manager, test runner), `final_answer` (signals readiness to open the PR).
- Prompt assembly: system prompt explains the Agent is a contributor on a Forge Repository; user prompt is the Issue title + body + repo tree summary.
- Token accounting per Run, persisted on the Run row.
- A small offline eval harness: a fixed set of toy Issues with expected outcomes (test passes, file edited, etc.) — used in CI to catch obvious regressions in prompt or tool changes.

### Decisions to nail in this slice

- Tool schema and boundaries (what `run_shell` allows).
- Failure-recovery policy (model retries vs. Run-level retries).
- Eval set composition and how a regression blocks merges.

## Acceptance criteria

- [ ] Assigning an Agent to a real Issue (e.g., "fix this failing test in `lib/foo.py`") produces a PR that actually fixes the test on at least the eval set.
- [ ] Token usage is recorded on the Run row and visible in logs.
- [ ] Tools cannot escape the Sandbox (no outbound internet, no host secrets).
- [ ] All model calls go through `ModelClient`; lint/CI rejects direct SDK imports outside that module.
- [ ] A recorded-response fake of `ModelClient` is used by tests covering Agent loop control flow.
- [ ] The eval harness runs in CI and fails the build on regression beyond a tolerated threshold.

## Blocked by

- #7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slice 8 — Real Agent loop: ModelClient + tool use replaces the fake PR #8

What to build

Decisions to nail in this slice

Acceptance criteria

Blocked by

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Slice 8 — Real Agent loop: ModelClient + tool use replaces the fake PR #8

Description

What to build

Decisions to nail in this slice

Acceptance criteria

Blocked by

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions