Skip to content

e2e test for live agent detection (codex + opencode) via local Ollama #664

@srid

Description

@srid

Problem

Neither Codex nor OpenCode integrations have an e2e test that drives a real agent CLI against a live model and asserts the UI badge lifecycle (waitingthinkingtool_usewaiting). Our unit tests cover parsers in isolation; our fs-level tests cover watcher plumbing. Nothing covers "user ran the actual binary and the right thing showed up in Kolu."

This is the same gap claude-code has partially covered via transcript-replay fixtures. It has bitten us twice on codex-provider already (the cumulative-tokens_used bug in 944f19d and the cached-token double-count bug in 431edd3 would both have been caught by an e2e assertion on the badge value).

Proposed shape

All-Nix, no API keys, deterministic:

  1. Ollama service (via services.ollama in a Nix test shell) with a small model pinned by hash — e.g. qwen2.5-coder:0.5b or similar. Model must support tool calls for the tool_use branch.
  2. Scripted agent session: spawn codex --yolo (or opencode run) in a scratch worktree, point it at the local Ollama endpoint via each tool's standard OpenAI-compatible base-URL env, feed it a canned prompt that exercises: a pure thinking turn, a tool-using turn, and completion.
  3. Assertions: tail the agent's state files (SQLite / JSONL) the same way Kolu does, and assert the observed CodexInfo/OpenCodeInfo sequence matches the expected lifecycle. Runs under the same ambient watcher stack the server uses.

Why Ollama

  • No API keys in CI
  • Deterministic-enough (we assert on lifecycle transitions, not content)
  • Reusable across integrations — codex, opencode, and any future OpenAI-compatible agent
  • Already in nixpkgs; zero new dependencies

Non-goals

  • Asserting model output text
  • Measuring latency
  • Covering every agent setting — just the badge-relevant lifecycle

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions