feat: add codex executor config support#233
Conversation
|
@microsoft-github-policy-service agree |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #233 +/- ##
=======================================
Coverage ? 75.71%
=======================================
Files ? 153
Lines ? 17851
Branches ? 0
=======================================
Hits ? 13515
Misses ? 3394
Partials ? 942
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Adds a new codex execution engine option for running Waza evals via the local Codex CLI, including project/eval configuration support and documentation updates.
Changes:
- Introduces
internal/execution/CodexEnginethat runs prompts throughcodex exec(with resume support) and parses Codex JSON events into Waza telemetry/usage. - Extends eval/project config to support
defaults.engine: codex, optional model resolution via~/.codex/config.toml, andmodel_reasoning_effort. - Updates CLI (
waza init,waza run), schemas, docs, and tests to cover Codex configuration and behavior.
Show a summary per file
| File | Description |
|---|---|
| site/src/content/docs/reference/waza-yaml.mdx | Documents .waza.yaml defaults including codex engine and model_reasoning_effort. |
| site/src/content/docs/reference/schema.mdx | Updates reference docs for schema fields/options (adds codex, model_reasoning_effort). |
| site/src/content/docs/guides/eval-yaml.mdx | Updates eval.yaml guide for codex executor and model_reasoning_effort. |
| schemas/eval.schema.json | Adds codex executor + model_reasoning_effort; relaxes model requirement. |
| schemas/config.schema.json | Adds codex engine option and model_reasoning_effort to project config defaults. |
| README.md | Mentions codex engine in repo layout and configuration examples. |
| internal/validation/schema_test.go | Adds schema validation test ensuring codex eval can omit model. |
| internal/scaffold/scaffold.go | Omits model: line in scaffolded eval.yaml when model is intentionally empty (codex defaults). |
| internal/scaffold/scaffold_test.go | Tests scaffold omission of empty model. |
| internal/projectconfig/schema_parity_test.go | Ensures schema defaults align with Go defaults for model_reasoning_effort. |
| internal/projectconfig/config.go | Adds model_reasoning_effort and supports intentional empty model for codex defaults. |
| internal/projectconfig/config_test.go | Adds tests for codex default model behavior and reasoning effort parsing. |
| internal/orchestration/runner.go | Passes ModelID + ModelReasoningEffort into execution requests. |
| internal/models/spec.go | Adds model_reasoning_effort to eval spec config struct. |
| internal/models/spec_test.go | Tests parsing of model_reasoning_effort. |
| internal/execution/engine.go | Extends ExecutionRequest with ModelReasoningEffort. |
| internal/execution/codex.go | Implements Codex-backed execution via CLI, including resume + JSON event parsing. |
| internal/execution/codex_test.go | Adds tests using a fake codex shell script. |
| cmd/waza/cmd_run.go | Wires codex engine into waza run, applies .waza.yaml defaults, displays codex “default model” message, rejects unsupported grader. |
| cmd/waza/cmd_run_test.go | Adds end-to-end CLI tests for codex runs + .waza.yaml default overrides. |
| cmd/waza/cmd_init.go | Adds codex as an init option and allows empty model when codex is selected. |
Copilot's findings
- Files reviewed: 21/21 changed files
- Comments generated: 3
| "required": [ | ||
| "trials_per_task", | ||
| "timeout_seconds", | ||
| "executor", | ||
| "model" | ||
| "executor" | ||
| ], | ||
| "additionalProperties": false, |
There was a problem hiding this comment.
Left as advisory — adding conditional JSON Schema validation for this case (e.g. \if executor != codex then require model) is non-trivial and the runtime already rejects a missing model for Copilot sessions. Tracking as a follow-up.
|
|
||
| engineWasDefault := spec.Config.EngineType == "" || | ||
| (spec.Config.EngineType == projectconfig.DefaultEngine && defaultEngine != projectconfig.DefaultEngine) | ||
| if engineWasDefault { | ||
| spec.Config.EngineType = defaultEngine | ||
| } | ||
|
|
||
| defaultModel := cfg.Defaults.Model | ||
| modelWasDefault := spec.Config.ModelID == "" || | ||
| (spec.Config.ModelID == projectconfig.DefaultModel && | ||
| (defaultModel != projectconfig.DefaultModel || engineWasDefault)) | ||
| if modelWasDefault { |
There was a problem hiding this comment.
Left as advisory for the PR author to address — the override behavior is intentional per the PR description. Tracking as a follow-up improvement.
spboyer
left a comment
There was a problem hiding this comment.
Adds Codex executor support; one remaining gap is that discovered trigger suites still flow into unsupported Codex telemetry paths.
Issues to address:
- cmd/waza/cmd_run.go:960 - Codex validation does not reject trigger suites before they call unsupported skill-invocation telemetry
… skip guard - validateEngineFeatureSupport now accepts specPath and calls trigger.Discover to reject codex runs when trigger tests are present alongside the eval spec; this surfaces a clear configuration error instead of per-prompt failures at runtime - Add Windows skip guard to TestCodexEngineExecuteRejectsSkillTriggerTelemetry to match the other codex tests that depend on the POSIX fake-codex shell script Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
12df2d6 to
7e44867
Compare
Summary
Adds a
codexexecutor for Waza eval runs, backed by the local Codex CLI and the user’s existing~/.codex/config.toml/ auth setup.Changes
internal/execution/CodexEngineusingcodex exec.defaults.engine: codexin.waza.yaml.~/.codex/config.tomlwhen no Waza model is configured..waza.yaml/ eval YAMLmodel_reasoning_effort, passed to Codex as-c model_reasoning_effort=....Model: default (Codex config)when Codex model is intentionally defaulted.waza init.codex exec resume.skill_invocationgrader / trigger telemetry paths that the Codex CLI does not expose.Validation
go test ./...cd site && npm run buildmodel_reasoning_effortreaches Codex