e2e test for live agent detection (codex + opencode) via local Ollama

## Problem

Neither Codex nor OpenCode integrations have an e2e test that drives a real agent CLI against a live model and asserts the UI badge lifecycle (`waiting` → `thinking` → `tool_use` → `waiting`). Our unit tests cover parsers in isolation; our fs-level tests cover watcher plumbing. Nothing covers "user ran the actual binary and the right thing showed up in Kolu."

This is the same gap claude-code has partially covered via transcript-replay fixtures. It has bitten us twice on codex-provider already (the cumulative-`tokens_used` bug in `944f19d` and the cached-token double-count bug in `431edd3` would both have been caught by an e2e assertion on the badge value).

## Proposed shape

All-Nix, no API keys, deterministic:

1. **Ollama service** (via `services.ollama` in a Nix test shell) with a small model pinned by hash — e.g. `qwen2.5-coder:0.5b` or similar. Model must support tool calls for the `tool_use` branch.
2. **Scripted agent session**: spawn `codex --yolo` (or `opencode run`) in a scratch worktree, point it at the local Ollama endpoint via each tool's standard OpenAI-compatible base-URL env, feed it a canned prompt that exercises: a pure thinking turn, a tool-using turn, and completion.
3. **Assertions**: tail the agent's state files (SQLite / JSONL) the same way Kolu does, and assert the observed `CodexInfo`/`OpenCodeInfo` sequence matches the expected lifecycle. Runs under the same ambient watcher stack the server uses.

## Why Ollama

- No API keys in CI
- Deterministic-enough (we assert on lifecycle transitions, not content)
- Reusable across integrations — codex, opencode, and any future OpenAI-compatible agent
- Already in nixpkgs; zero new dependencies

## Non-goals

- Asserting model output text
- Measuring latency
- Covering every agent setting — just the badge-relevant lifecycle

## Related

- PR #658 (codex-provider) added the integration without an e2e test
- PR #657 flagged this same gap

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

e2e test for live agent detection (codex + opencode) via local Ollama #664

Problem

Proposed shape

Why Ollama

Non-goals

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

e2e test for live agent detection (codex + opencode) via local Ollama #664

Description

Problem

Proposed shape

Why Ollama

Non-goals

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions