feat: add codex executor config support by davidhonig · Pull Request #233 · microsoft/waza

davidhonig · 2026-05-12T09:23:42Z

Summary

Adds a codex executor for Waza eval runs, backed by the local Codex CLI and the user’s existing ~/.codex/config.toml / auth setup.

Changes

Add internal/execution/CodexEngine using codex exec.
Support defaults.engine: codex in .waza.yaml.
Allow Codex model defaults from ~/.codex/config.toml when no Waza model is configured.
Support .waza.yaml / eval YAML model_reasoning_effort, passed to Codex as -c model_reasoning_effort=....
Show Model: default (Codex config) when Codex model is intentionally defaulted.
Add Codex option to waza init.
Update eval/project schemas and docs for Codex configuration.
Preserve follow-up prompt context using codex exec resume.
Parse Codex JSON events into Waza transcript, tool-call telemetry, and token usage.
Explicitly reject Codex skill_invocation grader / trigger telemetry paths that the Codex CLI does not expose.

Validation

go test ./...
cd site && npm run build
Manual MCL eval smoke test with Codex config confirmed:
- configured model is displayed
- model_reasoning_effort reaches Codex
- Codex starts successfully through Waza

davidhonig · 2026-05-12T09:25:14Z

@microsoft-github-policy-service agree

codecov-commenter · 2026-05-20T19:51:00Z

Codecov Report

❌ Patch coverage is 76.78100% with 88 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@bf77c75). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
internal/execution/codex.go	73.79%	54 Missing and 22 partials ⚠️
cmd/waza/cmd_run.go	86.84%	3 Missing and 2 partials ⚠️
internal/projectconfig/config.go	87.09%	2 Missing and 2 partials ⚠️
cmd/waza/cmd_init.go	0.00%	3 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #233   +/-   ##
=======================================
  Coverage        ?   75.71%           
=======================================
  Files           ?      153           
  Lines           ?    17851           
  Branches        ?        0           
=======================================
  Hits            ?    13515           
  Misses          ?     3394           
  Partials        ?      942

Flag	Coverage Δ
go-implementation	`75.71% <76.78%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

Adds a new codex execution engine option for running Waza evals via the local Codex CLI, including project/eval configuration support and documentation updates.

Changes:

Introduces internal/execution/CodexEngine that runs prompts through codex exec (with resume support) and parses Codex JSON events into Waza telemetry/usage.
Extends eval/project config to support defaults.engine: codex, optional model resolution via ~/.codex/config.toml, and model_reasoning_effort.
Updates CLI (waza init, waza run), schemas, docs, and tests to cover Codex configuration and behavior.

Show a summary per file

File	Description
site/src/content/docs/reference/waza-yaml.mdx	Documents `.waza.yaml` defaults including `codex` engine and `model_reasoning_effort`.
site/src/content/docs/reference/schema.mdx	Updates reference docs for schema fields/options (adds `codex`, `model_reasoning_effort`).
site/src/content/docs/guides/eval-yaml.mdx	Updates eval.yaml guide for `codex` executor and `model_reasoning_effort`.
schemas/eval.schema.json	Adds `codex` executor + `model_reasoning_effort`; relaxes model requirement.
schemas/config.schema.json	Adds `codex` engine option and `model_reasoning_effort` to project config defaults.
README.md	Mentions codex engine in repo layout and configuration examples.
internal/validation/schema_test.go	Adds schema validation test ensuring codex eval can omit `model`.
internal/scaffold/scaffold.go	Omits `model:` line in scaffolded eval.yaml when model is intentionally empty (codex defaults).
internal/scaffold/scaffold_test.go	Tests scaffold omission of empty model.
internal/projectconfig/schema_parity_test.go	Ensures schema defaults align with Go defaults for `model_reasoning_effort`.
internal/projectconfig/config.go	Adds `model_reasoning_effort` and supports intentional empty model for codex defaults.
internal/projectconfig/config_test.go	Adds tests for codex default model behavior and reasoning effort parsing.
internal/orchestration/runner.go	Passes `ModelID` + `ModelReasoningEffort` into execution requests.
internal/models/spec.go	Adds `model_reasoning_effort` to eval spec config struct.
internal/models/spec_test.go	Tests parsing of `model_reasoning_effort`.
internal/execution/engine.go	Extends `ExecutionRequest` with `ModelReasoningEffort`.
internal/execution/codex.go	Implements Codex-backed execution via CLI, including resume + JSON event parsing.
internal/execution/codex_test.go	Adds tests using a fake codex shell script.
cmd/waza/cmd_run.go	Wires `codex` engine into `waza run`, applies `.waza.yaml` defaults, displays codex “default model” message, rejects unsupported grader.
cmd/waza/cmd_run_test.go	Adds end-to-end CLI tests for codex runs + `.waza.yaml` default overrides.
cmd/waza/cmd_init.go	Adds `codex` as an init option and allows empty model when codex is selected.

Copilot's findings

Files reviewed: 21/21 changed files
Comments generated: 3

spboyer · 2026-06-05T00:51:00Z

      "required": [
        "trials_per_task",
        "timeout_seconds",
-        "executor",
-        "model"
+        "executor"
      ],
      "additionalProperties": false,


Left as advisory — adding conditional JSON Schema validation for this case (e.g. \if executor != codex then require model) is non-trivial and the runtime already rejects a missing model for Copilot sessions. Tracking as a follow-up.

spboyer · 2026-06-05T00:51:01Z

+
+	engineWasDefault := spec.Config.EngineType == "" ||
+		(spec.Config.EngineType == projectconfig.DefaultEngine && defaultEngine != projectconfig.DefaultEngine)
+	if engineWasDefault {
+		spec.Config.EngineType = defaultEngine
+	}
+
+	defaultModel := cfg.Defaults.Model
+	modelWasDefault := spec.Config.ModelID == "" ||
+		(spec.Config.ModelID == projectconfig.DefaultModel &&
+			(defaultModel != projectconfig.DefaultModel || engineWasDefault))
+	if modelWasDefault {


Left as advisory for the PR author to address — the override behavior is intentional per the PR description. Tracking as a follow-up improvement.

spboyer

Adds Codex executor support; one remaining gap is that discovered trigger suites still flow into unsupported Codex telemetry paths.

Issues to address:

cmd/waza/cmd_run.go:960 - Codex validation does not reject trigger suites before they call unsupported skill-invocation telemetry

… skip guard - validateEngineFeatureSupport now accepts specPath and calls trigger.Discover to reject codex runs when trigger tests are present alongside the eval spec; this surfaces a clear configuration error instead of per-prompt failures at runtime - Add Windows skip guard to TestCodexEngineExecuteRejectsSkillTriggerTelemetry to match the other codex tests that depend on the POSIX fake-codex shell script Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

davidhonig requested a review from spboyer as a code owner May 12, 2026 09:23

github-actions Bot enabled auto-merge (squash) May 12, 2026 09:24

spboyer requested a review from Copilot May 20, 2026 19:44

Copilot started reviewing on behalf of spboyer May 20, 2026 19:46 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

This was referenced May 20, 2026

[E1] Decouple ExecutionResponse from Copilot SDK + Multi-Agent Engine Support #10

Open

Bring Your Own Model — Ollama, OpenAI, Anthropic, OpenCode engines #11

Open

spboyer requested changes May 22, 2026

View reviewed changes

Comment thread cmd/waza/cmd_run.go

David Honig and others added 2 commits June 4, 2026 20:48

feat: add codex executor config support

040db84

spboyer force-pushed the dh/add-codex-cli-engine branch from 12df2d6 to 7e44867 Compare June 5, 2026 00:50

spboyer self-requested a review June 5, 2026 00:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add codex executor config support#233

feat: add codex executor config support#233
davidhonig wants to merge 2 commits into
microsoft:mainfrom
davidhonig:dh/add-codex-cli-engine

davidhonig commented May 12, 2026

Uh oh!

davidhonig commented May 12, 2026

Uh oh!

codecov-commenter commented May 20, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

spboyer Jun 5, 2026

Uh oh!

spboyer Jun 5, 2026

Uh oh!

Uh oh!

spboyer left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

davidhonig commented May 12, 2026

Summary

Changes

Validation

Uh oh!

davidhonig commented May 12, 2026

Uh oh!

codecov-commenter commented May 20, 2026

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Copilot's findings

Uh oh!

spboyer Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

spboyer Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

spboyer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants