Skip to content

feat: add codex executor config support#233

Open
davidhonig wants to merge 2 commits into
microsoft:mainfrom
davidhonig:dh/add-codex-cli-engine
Open

feat: add codex executor config support#233
davidhonig wants to merge 2 commits into
microsoft:mainfrom
davidhonig:dh/add-codex-cli-engine

Conversation

@davidhonig
Copy link
Copy Markdown

Summary

Adds a codex executor for Waza eval runs, backed by the local Codex CLI and the user’s existing ~/.codex/config.toml / auth setup.

Changes

  • Add internal/execution/CodexEngine using codex exec.
  • Support defaults.engine: codex in .waza.yaml.
  • Allow Codex model defaults from ~/.codex/config.toml when no Waza model is configured.
  • Support .waza.yaml / eval YAML model_reasoning_effort, passed to Codex as -c model_reasoning_effort=....
  • Show Model: default (Codex config) when Codex model is intentionally defaulted.
  • Add Codex option to waza init.
  • Update eval/project schemas and docs for Codex configuration.
  • Preserve follow-up prompt context using codex exec resume.
  • Parse Codex JSON events into Waza transcript, tool-call telemetry, and token usage.
  • Explicitly reject Codex skill_invocation grader / trigger telemetry paths that the Codex CLI does not expose.

Validation

  • go test ./...
  • cd site && npm run build
  • Manual MCL eval smoke test with Codex config confirmed:
    • configured model is displayed
    • model_reasoning_effort reaches Codex
    • Codex starts successfully through Waza

@davidhonig davidhonig requested a review from spboyer as a code owner May 12, 2026 09:23
@github-actions github-actions Bot enabled auto-merge (squash) May 12, 2026 09:24
@davidhonig
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 76.78100% with 88 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@bf77c75). Learn more about missing BASE report.

Files with missing lines Patch % Lines
internal/execution/codex.go 73.79% 54 Missing and 22 partials ⚠️
cmd/waza/cmd_run.go 86.84% 3 Missing and 2 partials ⚠️
internal/projectconfig/config.go 87.09% 2 Missing and 2 partials ⚠️
cmd/waza/cmd_init.go 0.00% 3 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #233   +/-   ##
=======================================
  Coverage        ?   75.71%           
=======================================
  Files           ?      153           
  Lines           ?    17851           
  Branches        ?        0           
=======================================
  Hits            ?    13515           
  Misses          ?     3394           
  Partials        ?      942           
Flag Coverage Δ
go-implementation 75.71% <76.78%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new codex execution engine option for running Waza evals via the local Codex CLI, including project/eval configuration support and documentation updates.

Changes:

  • Introduces internal/execution/CodexEngine that runs prompts through codex exec (with resume support) and parses Codex JSON events into Waza telemetry/usage.
  • Extends eval/project config to support defaults.engine: codex, optional model resolution via ~/.codex/config.toml, and model_reasoning_effort.
  • Updates CLI (waza init, waza run), schemas, docs, and tests to cover Codex configuration and behavior.
Show a summary per file
File Description
site/src/content/docs/reference/waza-yaml.mdx Documents .waza.yaml defaults including codex engine and model_reasoning_effort.
site/src/content/docs/reference/schema.mdx Updates reference docs for schema fields/options (adds codex, model_reasoning_effort).
site/src/content/docs/guides/eval-yaml.mdx Updates eval.yaml guide for codex executor and model_reasoning_effort.
schemas/eval.schema.json Adds codex executor + model_reasoning_effort; relaxes model requirement.
schemas/config.schema.json Adds codex engine option and model_reasoning_effort to project config defaults.
README.md Mentions codex engine in repo layout and configuration examples.
internal/validation/schema_test.go Adds schema validation test ensuring codex eval can omit model.
internal/scaffold/scaffold.go Omits model: line in scaffolded eval.yaml when model is intentionally empty (codex defaults).
internal/scaffold/scaffold_test.go Tests scaffold omission of empty model.
internal/projectconfig/schema_parity_test.go Ensures schema defaults align with Go defaults for model_reasoning_effort.
internal/projectconfig/config.go Adds model_reasoning_effort and supports intentional empty model for codex defaults.
internal/projectconfig/config_test.go Adds tests for codex default model behavior and reasoning effort parsing.
internal/orchestration/runner.go Passes ModelID + ModelReasoningEffort into execution requests.
internal/models/spec.go Adds model_reasoning_effort to eval spec config struct.
internal/models/spec_test.go Tests parsing of model_reasoning_effort.
internal/execution/engine.go Extends ExecutionRequest with ModelReasoningEffort.
internal/execution/codex.go Implements Codex-backed execution via CLI, including resume + JSON event parsing.
internal/execution/codex_test.go Adds tests using a fake codex shell script.
cmd/waza/cmd_run.go Wires codex engine into waza run, applies .waza.yaml defaults, displays codex “default model” message, rejects unsupported grader.
cmd/waza/cmd_run_test.go Adds end-to-end CLI tests for codex runs + .waza.yaml default overrides.
cmd/waza/cmd_init.go Adds codex as an init option and allows empty model when codex is selected.

Copilot's findings

  • Files reviewed: 21/21 changed files
  • Comments generated: 3

Comment thread schemas/eval.schema.json
Comment on lines 95 to 100
"required": [
"trials_per_task",
"timeout_seconds",
"executor",
"model"
"executor"
],
"additionalProperties": false,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left as advisory — adding conditional JSON Schema validation for this case (e.g. \if executor != codex then require model) is non-trivial and the runtime already rejects a missing model for Copilot sessions. Tracking as a follow-up.

Comment thread cmd/waza/cmd_run.go
Comment on lines +587 to +598

engineWasDefault := spec.Config.EngineType == "" ||
(spec.Config.EngineType == projectconfig.DefaultEngine && defaultEngine != projectconfig.DefaultEngine)
if engineWasDefault {
spec.Config.EngineType = defaultEngine
}

defaultModel := cfg.Defaults.Model
modelWasDefault := spec.Config.ModelID == "" ||
(spec.Config.ModelID == projectconfig.DefaultModel &&
(defaultModel != projectconfig.DefaultModel || engineWasDefault))
if modelWasDefault {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left as advisory for the PR author to address — the override behavior is intentional per the PR description. Tracking as a follow-up improvement.

Comment thread internal/execution/codex_test.go
Copy link
Copy Markdown
Member

@spboyer spboyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adds Codex executor support; one remaining gap is that discovered trigger suites still flow into unsupported Codex telemetry paths.

Issues to address:

  • cmd/waza/cmd_run.go:960 - Codex validation does not reject trigger suites before they call unsupported skill-invocation telemetry

Comment thread cmd/waza/cmd_run.go
David Honig and others added 2 commits June 4, 2026 20:48
… skip guard

- validateEngineFeatureSupport now accepts specPath and calls trigger.Discover to
  reject codex runs when trigger tests are present alongside the eval spec; this
  surfaces a clear configuration error instead of per-prompt failures at runtime
- Add Windows skip guard to TestCodexEngineExecuteRejectsSkillTriggerTelemetry
  to match the other codex tests that depend on the POSIX fake-codex shell script

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@spboyer spboyer force-pushed the dh/add-codex-cli-engine branch from 12df2d6 to 7e44867 Compare June 5, 2026 00:50
@spboyer spboyer self-requested a review June 5, 2026 00:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants