Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ they already have.

- `/codex:review` for a normal read-only Codex review
- `/codex:adversarial-review` for a steerable challenge review
- `/codex:test` to delegate matching test updates for the current diff
- `/codex:rescue`, `/codex:status`, `/codex:result`, and `/codex:cancel` to delegate work and manage background jobs

## Requirements
Expand Down Expand Up @@ -123,6 +124,38 @@ Examples:

This command is read-only. It does not fix code.

### `/codex:test`

Runs a strict test-writing workflow for the current code changes.

Use it when you want:

- Codex writes the matching tests for the current diff
- Claude to keep the implementation work while Codex writes the matching tests
- Codex to inspect the current diff, infer the repository test layout, and add the smallest sufficient test coverage
- Codex to avoid modifying production code by default

This command fails closed if it cannot collect the required repository context first. It looks for project guidance such as `CLAUDE.md`, `AGENTS.md`, or `README.md`, inspects the current diff, infers likely test targets, and then asks Codex to write the tests.

Examples:

```bash
/codex:test
/codex:test --base main
/codex:test --background
/codex:test --model gpt-5.4-mini --effort high
```

By default, Codex should:

- explain the author's apparent purpose for the change before editing
- summarize the touched production files and detected test locations
- update or create the matching tests
- avoid modifying production code by default

Implementation and maintenance notes for `/codex:test` live in
[`docs/codex-test-design.md`](./docs/codex-test-design.md).

### `/codex:rescue`

Hands a task to Codex through the `codex:codex-rescue` subagent.
Expand Down Expand Up @@ -229,6 +262,12 @@ When the review gate is enabled, the plugin uses a `Stop` hook to run a targeted
/codex:review
```

### Let Codex Add Tests

```bash
/codex:test
```

### Hand A Problem To Codex

```bash
Expand Down
91 changes: 91 additions & 0 deletions docs/codex-test-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# `/codex:test` Design Notes

This note summarizes the stable design constraints behind `/codex:test`.
It is intended for maintainers who need to evolve the test-planning pipeline
without re-learning the failure modes from past review cycles.

A useful review question for this command is: "Could uncertainty here cause
`/codex:test` to write tests in the wrong place, or collect the wrong context?"
Most of the constraints below exist to keep the answer to that question "no."

## Core Principles

1. Fail closed when required context is missing.

`/codex:test` should stop rather than guess when it cannot gather enough
repository context. Missing project guidance, missing test layout, or missing
test targets should be treated as hard failures instead of soft fallbacks.
When uncertainty would otherwise push the command toward the wrong context or
the wrong target file, failing closed is the intended behavior.

2. Keep repository context inside the repository boundary.

Repo walking must not escape `repoRoot`. Symlinked directories are skipped,
and symlinked files are only eligible when their realpath still stays under
the repository root. This prevents unrelated files or host secrets from being
pulled into the prompt.

3. Bound the prompt budget globally, not only per file.

Project guidance is useful, but unbounded guidance collection makes `/codex:test`
fragile in monorepos. Guidance files are prioritized and then capped by both a
small file-count limit and a total byte budget, with shallow high-priority files
winning over deep package-local READMEs.

4. Treat self-collected diff context as a first-class mode.

When the diff is too large to inline, the prompt must still tell Codex how to
collect the missing patch context with read-only git commands. Large changes
should degrade to a lighter summary, not to silent loss of guidance.

5. Only infer tests from live source files.

Changed-path lists can include deleted files. Deletion-only changes should not
cause `/codex:test` to propose creating brand-new tests for removed code, so
planning must ignore source paths that no longer exist in the working tree.

## Test Target Selection

1. Prefer the nearest package-local test root.

In monorepos, test planning should stay inside the package or module that owns
the changed source file. When a direct match is missing, new test targets should
be created under the nearest compatible test root instead of the first `tests/`
directory discovered anywhere in the repository. If no detected test root
actually belongs to the changed source's package, `/codex:test` should fail
closed instead of selecting the closest-looking package by shared path prefix.

2. Scope direct matches by locality, not basename alone.

Two packages can legitimately contain the same test basename such as
`id.test.js`. Basename matches are only safe after they have been narrowed to
the nearest package-local test root. Otherwise `/codex:test` may edit tests in
an unrelated package.

3. Preserve source subdirectories in created test paths.

When a new test file is created, the path should preserve the source structure
after the language-specific source root. For example:

- `src/pkg/foo.py -> tests/pkg/test_foo.py`
- `packages/b/src/new.js -> packages/b/tests/new.test.js`

Flattening nested paths causes collisions across modules with the same stem and
makes the planned test target drift away from the changed code.

4. Match existing tests conservatively.

Substring-based matching is too loose. `id` should not match `userid.test.js`,
and similarly named files in sibling packages should not be pulled into the same
plan. Matching should optimize for "smallest safe target set", even if that
means falling back to creating a new test file more often.

## Maintenance Notes

- If you loosen repo-walk or symlink behavior, add tests that prove prompt
inputs still stay under `repoRoot`.
- If you change guidance selection, keep both a file-count cap and a total-byte
cap unless there is a stronger replacement.
- If you change path inference, add monorepo fixtures that cover both direct
matches and create-path planning.
- If you change diff collection, verify both inline-diff and self-collect modes.
60 changes: 60 additions & 0 deletions plugins/codex/commands/test.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
---
description: Delegate test writing for the current code changes to Codex with a strict test-only workflow
argument-hint: '[--wait|--background] [--base <ref>] [--scope auto|working-tree|branch] [--model <model|spark>] [--effort <none|minimal|low|medium|high|xhigh>]'
disable-model-invocation: true
allowed-tools: Read, Glob, Grep, Bash(node:*), Bash(git:*), AskUserQuestion
---

Run Codex test writing through the shared plugin runtime.

Raw slash-command arguments:
`$ARGUMENTS`

Core constraints:
- This command is test-only.
- Do not modify production code by default.
- Fail closed if the runtime cannot collect the required repository context.
- Your only job is to run the command and return Codex's output verbatim to the user.

Execution mode rules:
- If the raw arguments include `--wait`, do not ask. Run in the foreground.
- If the raw arguments include `--background`, do not ask. Run in a Claude background task.
- Otherwise, estimate the change size before asking:
- For working-tree mode, start with `git status --short --untracked-files=all`.
- For working-tree mode, also inspect both `git diff --shortstat --cached` and `git diff --shortstat`.
- For base-branch mode, use `git diff --shortstat <base>...HEAD`.
- Treat untracked files or directories as real work even when `git diff --shortstat` is empty.
- Recommend waiting only when the change is clearly tiny, roughly 1-2 files total and no sign of broader test work.
- In every other case, including unclear size, recommend background.
- When in doubt, run the command instead of claiming there is no test work to do.
- Then use `AskUserQuestion` exactly once with two options, putting the recommended option first and suffixing its label with `(Recommended)`:
- `Wait for results`
- `Run in background`

Argument handling:
- Preserve the user's arguments exactly.
- Do not strip `--wait` or `--background` yourself.
- The companion script parses `--wait` and `--background`, but Claude Code's `Bash(..., run_in_background: true)` is what actually detaches the run.
- This command accepts `--base <ref>` and `--scope auto|working-tree|branch`.
- This command accepts `--model` and `--effort` and forwards them to the companion runtime.
- Do not add extra instructions or rewrite the user's intent.

Foreground flow:
- Run:
```bash
node "${CLAUDE_PLUGIN_ROOT}/scripts/codex-companion.mjs" test "$ARGUMENTS"
```
- Return the command stdout verbatim, exactly as-is.
- Do not paraphrase, summarize, or add commentary before or after it.

Background flow:
- Launch the command with `Bash` in the background:
```typescript
Bash({
command: `node "${CLAUDE_PLUGIN_ROOT}/scripts/codex-companion.mjs" test "$ARGUMENTS"`,
description: "Codex test writing",
run_in_background: true
})
```
- Do not call `BashOutput` or wait for completion in this turn.
- After launching the command, tell the user: "Codex test writing started in the background. Check `/codex:status` for progress."
64 changes: 64 additions & 0 deletions plugins/codex/prompts/write-tests.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
<role>
You are Codex writing tests for an existing code change.
Your job is to understand the author's intent, map the impacted code to the repository's testing layout, and then make the smallest sufficient test-only edits.
</role>

<task>
Write or update the tests for {{TARGET_LABEL}}.
</task>

<grounding_rules>
Use the provided project guidance and diff context as the starting point for your understanding of the change.
Before you edit anything, infer and state the author's purpose for this change in one short section titled exactly `Author purpose:`.
</grounding_rules>

<constraints>
- Default to test-only changes.
- Do not modify production code by default.
- Do not delete existing tests unless the tested behavior is explicitly removed by this diff or the test is being replaced by an updated equivalent covering the same intent.
- If you believe a production code change is required, stop and explain why instead of editing it.
- Follow the repository's existing test conventions, naming patterns, and directory layout.
- Prefer the smallest sufficient regression coverage for the changed behavior.
- Reuse existing fixtures, helpers, and snapshots when they already fit.
</constraints>

<required_pre_edit_summary>
Before editing any files, print a concise plan that includes these headings exactly:
- `Author purpose:`
- `Touched production files:`
- `Detected test locations:`
- `Planned test file changes:`

Under `Planned test file changes:`, list which files you expect to create, update, or remove.
Mention the relevant test functions or scenarios you expect to add or update when you can infer them from the context.
</required_pre_edit_summary>

<verification>
Prefer the repository-specific test commands listed below when they fit the changed tests.
After editing, run the most relevant repository test command you can identify from the available context.
If the repository does not expose a clear test command, run the narrowest command that verifies the changed tests.
</verification>

<suggested_test_commands>
{{SUGGESTED_TEST_COMMANDS}}
</suggested_test_commands>

<project_guidance>
{{PROJECT_GUIDANCE}}
</project_guidance>

<diff_collection_guidance>
{{DIFF_COLLECTION_GUIDANCE}}
</diff_collection_guidance>

<diff_context>
{{DIFF_CONTEXT}}
</diff_context>

<test_layout>
{{TEST_LAYOUT}}
</test_layout>

<proposed_test_plan>
{{TEST_PLAN}}
</proposed_test_plan>
Loading