openai · qiankunli · Apr 11, 2026 · Apr 11, 2026 · Apr 11, 2026 · Apr 11, 2026
diff --git a/README.md b/README.md
@@ -11,6 +11,7 @@ they already have.
 
 - `/codex:review` for a normal read-only Codex review
 - `/codex:adversarial-review` for a steerable challenge review
+- `/codex:test` to delegate matching test updates for the current diff
 - `/codex:rescue`, `/codex:status`, `/codex:result`, and `/codex:cancel` to delegate work and manage background jobs
 
 ## Requirements
@@ -123,6 +124,38 @@ Examples:
 
 This command is read-only. It does not fix code.
 
+### `/codex:test`
+
+Runs a strict test-writing workflow for the current code changes.
+
+Use it when you want:
+
+- Codex writes the matching tests for the current diff
+- Claude to keep the implementation work while Codex writes the matching tests
+- Codex to inspect the current diff, infer the repository test layout, and add the smallest sufficient test coverage
+- Codex to avoid modifying production code by default
+
+This command fails closed if it cannot collect the required repository context first. It looks for project guidance such as `CLAUDE.md`, `AGENTS.md`, or `README.md`, inspects the current diff, infers likely test targets, and then asks Codex to write the tests.
+
+Examples:
+
+```bash
+/codex:test
+/codex:test --base main
+/codex:test --background
+/codex:test --model gpt-5.4-mini --effort high
+```
+
+By default, Codex should:
+
+- explain the author's apparent purpose for the change before editing
+- summarize the touched production files and detected test locations
+- update or create the matching tests
+- avoid modifying production code by default
+
+Implementation and maintenance notes for `/codex:test` live in
+[`docs/codex-test-design.md`](./docs/codex-test-design.md).
+
 ### `/codex:rescue`
 
 Hands a task to Codex through the `codex:codex-rescue` subagent.
@@ -229,6 +262,12 @@ When the review gate is enabled, the plugin uses a `Stop` hook to run a targeted
 /codex:review
 ```
 
+### Let Codex Add Tests
+
+```bash
+/codex:test
+```
+
 ### Hand A Problem To Codex
 
 ```bash

diff --git a/docs/codex-test-design.md b/docs/codex-test-design.md
@@ -0,0 +1,91 @@
+# `/codex:test` Design Notes
+
+This note summarizes the stable design constraints behind `/codex:test`.
+It is intended for maintainers who need to evolve the test-planning pipeline
+without re-learning the failure modes from past review cycles.
+
+A useful review question for this command is: "Could uncertainty here cause
+`/codex:test` to write tests in the wrong place, or collect the wrong context?"
+Most of the constraints below exist to keep the answer to that question "no."
+
+## Core Principles
+
+1. Fail closed when required context is missing.
+
+`/codex:test` should stop rather than guess when it cannot gather enough
+repository context. Missing project guidance, missing test layout, or missing
+test targets should be treated as hard failures instead of soft fallbacks.
+When uncertainty would otherwise push the command toward the wrong context or
+the wrong target file, failing closed is the intended behavior.
+
+2. Keep repository context inside the repository boundary.
+
+Repo walking must not escape `repoRoot`. Symlinked directories are skipped,
+and symlinked files are only eligible when their realpath still stays under
+the repository root. This prevents unrelated files or host secrets from being
+pulled into the prompt.
+
+3. Bound the prompt budget globally, not only per file.
+
+Project guidance is useful, but unbounded guidance collection makes `/codex:test`
+fragile in monorepos. Guidance files are prioritized and then capped by both a
+small file-count limit and a total byte budget, with shallow high-priority files
+winning over deep package-local READMEs.
+
+4. Treat self-collected diff context as a first-class mode.
+
+When the diff is too large to inline, the prompt must still tell Codex how to
+collect the missing patch context with read-only git commands. Large changes
+should degrade to a lighter summary, not to silent loss of guidance.
+
+5. Only infer tests from live source files.
+
+Changed-path lists can include deleted files. Deletion-only changes should not
+cause `/codex:test` to propose creating brand-new tests for removed code, so
+planning must ignore source paths that no longer exist in the working tree.
+
+## Test Target Selection
+
+1. Prefer the nearest package-local test root.
+
+In monorepos, test planning should stay inside the package or module that owns
+the changed source file. When a direct match is missing, new test targets should
+be created under the nearest compatible test root instead of the first `tests/`
+directory discovered anywhere in the repository. If no detected test root
+actually belongs to the changed source's package, `/codex:test` should fail
+closed instead of selecting the closest-looking package by shared path prefix.
+
+2. Scope direct matches by locality, not basename alone.
+
+Two packages can legitimately contain the same test basename such as
+`id.test.js`. Basename matches are only safe after they have been narrowed to
+the nearest package-local test root. Otherwise `/codex:test` may edit tests in
+an unrelated package.
+
+3. Preserve source subdirectories in created test paths.
+
+When a new test file is created, the path should preserve the source structure
+after the language-specific source root. For example:
+
+- `src/pkg/foo.py -> tests/pkg/test_foo.py`
+- `packages/b/src/new.js -> packages/b/tests/new.test.js`
+
+Flattening nested paths causes collisions across modules with the same stem and
+makes the planned test target drift away from the changed code.
+
+4. Match existing tests conservatively.
+
+Substring-based matching is too loose. `id` should not match `userid.test.js`,
+and similarly named files in sibling packages should not be pulled into the same
+plan. Matching should optimize for "smallest safe target set", even if that
+means falling back to creating a new test file more often.
+
+## Maintenance Notes
+
+- If you loosen repo-walk or symlink behavior, add tests that prove prompt
+  inputs still stay under `repoRoot`.
+- If you change guidance selection, keep both a file-count cap and a total-byte
+  cap unless there is a stronger replacement.
+- If you change path inference, add monorepo fixtures that cover both direct
+  matches and create-path planning.
+- If you change diff collection, verify both inline-diff and self-collect modes.
diff --git a/plugins/codex/commands/test.md b/plugins/codex/commands/test.md
@@ -0,0 +1,60 @@
+---
+description: Delegate test writing for the current code changes to Codex with a strict test-only workflow
+argument-hint: '[--wait|--background] [--base <ref>] [--scope auto|working-tree|branch] [--model <model|spark>] [--effort <none|minimal|low|medium|high|xhigh>]'
+disable-model-invocation: true
+allowed-tools: Read, Glob, Grep, Bash(node:*), Bash(git:*), AskUserQuestion
+---
+
+Run Codex test writing through the shared plugin runtime.
+
+Raw slash-command arguments:
+`$ARGUMENTS`
+
+Core constraints:
+- This command is test-only.
+- Do not modify production code by default.
+- Fail closed if the runtime cannot collect the required repository context.
+- Your only job is to run the command and return Codex's output verbatim to the user.
+
+Execution mode rules:
+- If the raw arguments include `--wait`, do not ask. Run in the foreground.
+- If the raw arguments include `--background`, do not ask. Run in a Claude background task.
+- Otherwise, estimate the change size before asking:
+  - For working-tree mode, start with `git status --short --untracked-files=all`.
+  - For working-tree mode, also inspect both `git diff --shortstat --cached` and `git diff --shortstat`.
+  - For base-branch mode, use `git diff --shortstat <base>...HEAD`.
+  - Treat untracked files or directories as real work even when `git diff --shortstat` is empty.
+  - Recommend waiting only when the change is clearly tiny, roughly 1-2 files total and no sign of broader test work.
+  - In every other case, including unclear size, recommend background.
+  - When in doubt, run the command instead of claiming there is no test work to do.
+- Then use `AskUserQuestion` exactly once with two options, putting the recommended option first and suffixing its label with `(Recommended)`:
+  - `Wait for results`
+  - `Run in background`
+
+Argument handling:
+- Preserve the user's arguments exactly.
+- Do not strip `--wait` or `--background` yourself.
+- The companion script parses `--wait` and `--background`, but Claude Code's `Bash(..., run_in_background: true)` is what actually detaches the run.
+- This command accepts `--base <ref>` and `--scope auto|working-tree|branch`.
+- This command accepts `--model` and `--effort` and forwards them to the companion runtime.
+- Do not add extra instructions or rewrite the user's intent.
+
+Foreground flow:
+- Run:
+```bash
+node "${CLAUDE_PLUGIN_ROOT}/scripts/codex-companion.mjs" test "$ARGUMENTS"
+```
+- Return the command stdout verbatim, exactly as-is.
+- Do not paraphrase, summarize, or add commentary before or after it.
+
+Background flow:
+- Launch the command with `Bash` in the background:
+```typescript
+Bash({
+  command: `node "${CLAUDE_PLUGIN_ROOT}/scripts/codex-companion.mjs" test "$ARGUMENTS"`,
+  description: "Codex test writing",
+  run_in_background: true
+})
+```
+- Do not call `BashOutput` or wait for completion in this turn.
+- After launching the command, tell the user: "Codex test writing started in the background. Check `/codex:status` for progress."
diff --git a/plugins/codex/prompts/write-tests.md b/plugins/codex/prompts/write-tests.md
@@ -0,0 +1,64 @@
+<role>
+You are Codex writing tests for an existing code change.
+Your job is to understand the author's intent, map the impacted code to the repository's testing layout, and then make the smallest sufficient test-only edits.
+</role>
+
+<task>
+Write or update the tests for {{TARGET_LABEL}}.
+</task>
+
+<grounding_rules>
+Use the provided project guidance and diff context as the starting point for your understanding of the change.
+Before you edit anything, infer and state the author's purpose for this change in one short section titled exactly `Author purpose:`.
+</grounding_rules>
+
+<constraints>
+- Default to test-only changes.
+- Do not modify production code by default.
+- Do not delete existing tests unless the tested behavior is explicitly removed by this diff or the test is being replaced by an updated equivalent covering the same intent.
+- If you believe a production code change is required, stop and explain why instead of editing it.
+- Follow the repository's existing test conventions, naming patterns, and directory layout.
+- Prefer the smallest sufficient regression coverage for the changed behavior.
+- Reuse existing fixtures, helpers, and snapshots when they already fit.
+</constraints>
+
+<required_pre_edit_summary>
+Before editing any files, print a concise plan that includes these headings exactly:
+- `Author purpose:`
+- `Touched production files:`
+- `Detected test locations:`
+- `Planned test file changes:`
+
+Under `Planned test file changes:`, list which files you expect to create, update, or remove.
+Mention the relevant test functions or scenarios you expect to add or update when you can infer them from the context.
+</required_pre_edit_summary>
+
+<verification>
+Prefer the repository-specific test commands listed below when they fit the changed tests.
+After editing, run the most relevant repository test command you can identify from the available context.
+If the repository does not expose a clear test command, run the narrowest command that verifies the changed tests.
+</verification>
+
+<suggested_test_commands>
+{{SUGGESTED_TEST_COMMANDS}}
+</suggested_test_commands>
+
+<project_guidance>
+{{PROJECT_GUIDANCE}}
+</project_guidance>
+
+<diff_collection_guidance>
+{{DIFF_COLLECTION_GUIDANCE}}
+</diff_collection_guidance>
+
+<diff_context>
+{{DIFF_CONTEXT}}
+</diff_context>
+
+<test_layout>
+{{TEST_LAYOUT}}
+</test_layout>
+
+<proposed_test_plan>
+{{TEST_PLAN}}
+</proposed_test_plan>