Skip to content

Upstream CLI drift report: gemini-cli v0.38/v0.39 & codex v0.122 — no hard breakage, several improvement opportunities #24

@Lykhoyda

Description

@Lykhoyda

Summary

Audit of the two upstream CLIs we wrap — google-gemini/gemini-cli and openai/codex — against our current executor code (packages/gemini-mcp/src/utils/geminiExecutor.ts, packages/codex-mcp/src/utils/codexExecutor.ts, plus their constants.ts).

Bottom line: no immediate breaking changes. Our flags and JSON schemas still match the upstream contracts. But there are several places where we are leaving value on the table, and one or two items that will bite us the next time Google or OpenAI does a rename.

Versions checked:

  • gemini-cli latest stable v0.38.2 (2026-04-17); preview v0.39.0-preview.0 (2026-04-14); nightly v0.40.0-nightly (2026-04-15)
  • codex latest stable rust-v0.122.0 (2026-04-20); alphas through v0.123.0-alpha.6 (2026-04-21)

1. Gemini CLI

1.1 No breaking changes to flags we use

We currently pass -m, -s, -p, --output-format stream-json, --resume, --include-directories. All still supported in v0.38.2 / v0.39.0-preview. The headless mode contract (JSON object with response / stats / error) is unchanged.

1.2 Breaking-change watchlist (not yet triggered)

  • Model alias graduation. Our default is hard-coded to gemini-3.1-pro-preview and fallback to gemini-3-flash-preview (packages/gemini-mcp/src/constants.ts:25). Upstream discussion #19724 shows that gemini-3.1-pro / gemini-3-flash (no -preview suffix) are already referenced in Auto routing. When Google drops the preview suffix, our hard-coded defaults will silently keep hitting a deprecated endpoint until it's retired, then break. Mitigation: document the env vars ASK_GEMINI_MODEL / ASK_GEMINI_FALLBACK_MODEL more prominently and bump defaults once the non-preview aliases are the stable names.
  • Tool-call events in stream-json. Per the headless mode docs, the stream-json event stream includes tool_use and tool_result events in addition to init / message / result / error. Our parser (parseGeminiStreamJsonl, geminiExecutor.ts:265) silently drops those event types. Not broken today — we only care about final assistant text — but if Google promotes tool events to required for progress reporting, we'll lose visibility.

1.3 Improvement opportunities

  • Surface structured exit codes. Gemini CLI headless mode defines: 0 success, 1 general error, 42 input error, 53 turn-limit exceeded. commandExecutor.ts:136 swallows the code and just returns a generic "Failed with exit code N" string. We could map these to typed errors so ask-gemini callers get a meaningful TurnLimitExceeded vs InputError distinction instead of string matching.
  • Auto routing. gemini -m auto is now the default interactive behaviour and picks between Pro and Flash per-prompt. Exposing auto as a model choice would let users opt into upstream routing without having to pick the preview alias. This would also cleanly sidestep the preview-suffix graduation issue above.
  • Tool event forwarding. Extending makeStreamingProgressForwarder (geminiExecutor.ts:334) to emit tool_use / tool_result events as progress lines would give a much better live UX in /multi-review and /brainstorm when Gemini uses its built-in tools.
  • /memory inbox + background skill extraction (v0.38/v0.39). Gemini now extracts reusable "skills" in the background and surfaces them via /memory inbox. For our review/brainstorm flows this is a free quality boost with no code change — but we should document it so users enable it, and potentially plumb the session feature through includeDirs so skills persist across review invocations.

1.4 Known upstream bug we should track

  • Historical flakiness of --output-format json (issue #11184 — closed): response sometimes contained escaped-string JSON rather than a JSON object. We already dodge this by using stream-json by default, but ask-gemini-edit and ask-gemini fall back to parseGeminiJsonOutput in the !parsedAnyEvent branch (geminiExecutor.ts:312). Worth a regression test with the latest CLI to confirm the fallback still works.

2. Codex CLI

2.1 No breaking changes to flags or JSONL schema

We pass exec, resume, --skip-git-repo-check, --ephemeral, --full-auto, --json, -m. All present in v0.122.0. The JSONL event shape we parse — thread.startedthread_id, item.completed with item.type === "agent_message" + item.text, turn.completed.usage with input_tokens / cached_input_tokens / output_tokens — is documented as stable in the current exec --json cheatsheet.

2.2 Useful new flags in v0.122.0 we are not using

  • --ignore-user-config and --ignore-rules on codex exec. These make an exec run deterministic and independent of the user's ~/.codex/config.toml / project AGENTS.md — exactly what an MCP wrapper wants for reproducible review output. I'd recommend adding these to buildArgs in codexExecutor.ts:129 for the ephemeral (non-sessionId) path, behind an opt-out env var so users who deliberately customize config keep their overrides.

2.3 Missing metadata in exec --json (upstream issue)

  • openai/codex#14736 — still open — points out that exec --json does not emit the resolved model name anywhere in the event stream. Our buildUsageStats (codexExecutor.ts:41) compensates by using the requested model (or our hard-coded fallback) as model. This is correct only because we control fallback client-side; if OpenAI ever adds server-side routing (they already have it for ChatGPT accounts), our usage.model will be wrong. Worth following this issue; when it lands, switch buildUsageStats to prefer the event-reported model.

2.4 Model catalog is still in sync

Our defaults (gpt-5.4 default, gpt-5.4-mini fallback — codex-mcp/src/constants.ts:16) match the current Codex stable catalog. Note: gpt-5.1-codex-* family was deprecated across GitHub Copilot on 2026-04-01 but those were never our defaults, so no action needed. ChatGPT-account-only restriction on gpt-5.4 (issue #14181) is a pre-existing constraint our users hit via API-key auth — no CLI change.

2.5 Chat Completions API deprecation (medium-term)

Codex docs note Chat Completions API support will be removed "in future releases." Since we only shell out to codex exec and let the CLI handle API transport, this should not affect us — but worth a note in CHANGELOG.md so contributors know not to reach past the CLI.

2.6 Neutral / TUI-only (no action)

v0.121.0 / v0.122.0 additions that do NOT affect our wrapper: codex marketplace add, codex app desktop integration, /side TUI command, tabbed plugin browsing, memory mode TUI controls, devcontainer bubblewrap profile, reverse-search prompt history. These are all interactive-mode features and codex exec --json ignores them.


3. Proposed action items

Ordered by effort/payoff ratio:

  1. [Easy, high value] Add --ignore-user-config + --ignore-rules to the ephemeral codex exec path in codex-mcp/src/utils/codexExecutor.ts behind an opt-out (e.g. ASK_CODEX_RESPECT_USER_CONFIG=1). Makes review output reproducible across dev machines.
  2. [Easy, medium value] Surface Gemini's documented exit codes (42 = input error, 53 = turn limit) as typed errors in packages/shared/src/commandExecutor.ts.
  3. [Medium, medium value] Teach parseGeminiStreamJsonl and makeStreamingProgressForwarder about tool_use / tool_result events so progress output reflects what Gemini is actually doing during long runs.
  4. [Easy, low value today, avoids future breakage] Document ASK_GEMINI_MODEL / ASK_CODEX_MODEL overrides prominently in README. Add CI check that pulls Gemini's supported-model list and warns if our default is no longer in it.
  5. [Tracking only] Watch openai/codex#14736; when model is added to JSONL events, have buildUsageStats prefer the event-reported model over the requested one.
  6. [Tracking only] Watch for Gemini 3.1 Pro / Gemini 3 Flash graduating out of -preview; bump the default in gemini-mcp/constants.ts when that happens.

No hotfix required — none of these are user-visible regressions today.


Report generated from an upstream release audit. Sources: gemini-cli releases, codex releases, codex changelog, upstream issues linked inline.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions