Skip to content

feat(engine,llm): add codex app-server provider wiring and routing autonomy#53

Open
vadimcomanescu wants to merge 4 commits intodanshapiro:mainfrom
vadimcomanescu:upstream/feature/codex-app-server-routing-autonomy
Open

feat(engine,llm): add codex app-server provider wiring and routing autonomy#53
vadimcomanescu wants to merge 4 commits intodanshapiro:mainfrom
vadimcomanescu:upstream/feature/codex-app-server-routing-autonomy

Conversation

@vadimcomanescu
Copy link

Summary

This PR introduces first-class codex app-server provider support in runtime and preflight, and hardens routing behavior for API-mode agent execution.

Problem

  • codex app-server support was not fully wired as a first-class provider across preflight, runtime client construction, and provider spec surfaces.
  • Agent-loop autonomy controls were not consistently propagated when running via API backends.
  • Spark routing behavior depended on brittle CLI-only assumptions.

Why This Change

The engine should treat codex app-server as an explicit provider contract, not an implicit special case. This keeps provider behavior deterministic and reviewable.

What Changed

  • Added codex app-server provider adapter, protocol translation, transport, and env configuration support.
  • Wired provider spec and runtime client creation so codex app-server can be selected and validated predictably.
  • Hardened preflight coverage for codex app-server capability/env routing checks.
  • Reworked CLI-only model routing assumptions so Spark handling follows explicit configuration and tests.
  • Added explicit provider options mapping for codex app-server autonomy controls in agent-loop API mode.

User-Visible Behavior

  • codex app-server can be configured and validated as a first-class provider path.
  • Routing decisions for codex and Spark are explicit and test-covered.
  • API agent loop receives the expected provider policy/options payload for autonomous execution.

Risk

  • Routing behavior changed for some codex/spark configurations.
  • Mitigation: targeted tests cover preflight, routing, provider option mapping, and translator behavior.

Validation

  • gofmt and go vet ./...
  • go test ./internal/llm/providers/codexappserver -count=1
  • go test ./internal/providerspec -run TestBuiltinCodexAppServerDefaults -count=1
  • go test ./internal/attractor/engine -run 'TestRunWithConfig_PreflightCodexAppServer|TestCLIOnlyModelOverride|TestAgentLoopProviderOptions' -count=1

Out of Scope

  • Stream lifecycle event semantics and SSE propagation are handled in the follow-up streaming PR.

(cherry picked from commit c0110c630859508ce3390cf976cd11e8a06fbb6b)
(cherry picked from commit 1a57a2ca640d65457fa663e9548d41c9be22ab53)
(cherry picked from commit 23f3ce1e2329597b204b530bc23b1f817b5266c2)
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7fda33cada

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +647 to +648
func (t *stdioTransport) handleUnexpectedProcessTermination(err error) {
t.rejectAllPending(err)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Fail active turns when app-server exits unexpectedly

handleUnexpectedProcessTermination only rejects pending RPC requests, but after turn/start returns both runTurn and Stream are waiting on completion notifications, not a pending request. If the codex child process dies in that window (and RequestTimeout is left at the default 0), nothing wakes those waiters, so Complete/Stream can block until an external context cancellation instead of failing immediately. Propagate unexpected process exit to active turn listeners (or cancel in-flight turn contexts) so calls fail fast.

Useful? React with 👍 / 👎.

Comment on lines +448 to +452
info.Status = asInt(statusVal, 0)
info.HasStatus = true
} else if statusVal, ok := root["status"]; ok {
info.Status = asInt(statusVal, 0)
info.HasStatus = true

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Ignore non-numeric status when classifying codex errors

This marks HasStatus=true even when status is a symbolic string (for example turn lifecycle values like "failed"), because asInt falls back to 0. mapCodexError then short-circuits to ErrorFromHTTPStatus(..., 0, ...) and skips the code/message classification path, which can misclassify real request/auth/rate-limit failures as unknown status-0 errors (often retryable). Only set HasStatus when parsing yields a valid HTTP-like numeric status.

Useful? React with 👍 / 👎.

@vadimcomanescu
Copy link
Author

Follow-up (clean formatting): addressed Codex review findings in commit 6be757a.

  • Fail in-flight Complete/Stream waits when the codex app-server process exits, including when request timeout is disabled.
  • Only treat status as HTTP status when it is numeric and in the valid HTTP range.
  • Added regression tests for process-lifecycle wait unblocking and status normalization.

Validation run: go test ./internal/llm/providers/codexappserver -count=1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant