Skip to content

feat(adapter): add coven adapter login codex with OAuth port preflight#271

Closed
BunsDev wants to merge 1 commit into
mainfrom
feat/codex-login-shim
Closed

feat(adapter): add coven adapter login codex with OAuth port preflight#271
BunsDev wants to merge 1 commit into
mainfrom
feat/codex-login-shim

Conversation

@BunsDev

@BunsDev BunsDev commented Jun 28, 2026

Copy link
Copy Markdown
Member

Fixes the recurring "Failed to bind port 1455" error you hit whenever a codex login flow gets killed mid-OAuth and leaks an orphan listener on the callback port.

The problem

codex login opens a local OAuth callback server on port 1455. If the flow gets killed mid-auth (esc, SIGINT, browser closed, network blip) the listener sometimes leaks as an orphan process. The next codex login attempt then crashes with:

Codex OAuth failed: Failed to bind port 1455: Address already in use (os error 98)

(os error 48 on macOS, os error 98 on Linux — same condition.)

Upstream Codex has no auto-recover or fallback-port logic, so until that lands in @openai/codex itself, this shim handles it from our side.

The fix

New subcommand: coven adapter login <adapter>. Today the only adapter that needs preflight is codex. Behavior:

  1. Try to bind 127.0.0.1:1455 ourselves
  2. If AddrInUse, identify the holding pid via lsof -ti tcp:1455
  3. Use sysinfo to fetch the holder's process descriptor (name + argv)
  4. Only kill it if its argv/path clearly contains the token codex — split on whitespace and path separators, so vscodex, not-codex-tool, etc. don't match
  5. SIGTERM (1s grace), then SIGKILL if still alive
  6. Wait up to 1.5s for the port to free up
  7. exec codex login

If we can't ID the holder (no lsof, ambiguous results), or if it's clearly NOT codex, we refuse to kill and print a clear instruction:

$ coven adapter login codex
Coven codex login
  Preflight: checking port 1455... held
Error: port 1455 is held by pid 83156 (`Python`), which does NOT look like a codex OAuth helper. Refusing to kill it. Run `lsof -i tcp:1455` to investigate, then either stop that process or rerun `coven adapter login codex` once it is gone.

Why this shape

  • Not coven doctor — users hit this error mid-frustration trying to log in. They don't want "go run a separate doctor command first". One command that just works.
  • Not changing Codex's port — it's hard-coded upstream; we can't.
  • coven adapter login codex not coven codex login — reuses the existing coven adapter ... namespace (list/doctor/install/login).
  • Conservative kill heuristic — substring-matching codex would have false positives (vscodex, @openai/codex-darwin-arm64). Token-level matching is bulletproof and explicit.

Tests

New module crates/coven-cli/src/codex_login.rs with 7 unit tests, all passing:

  • port_is_free correctly identifies free vs held ports (uses ephemeral port to avoid flakes)
  • descriptor_looks_like_codex recognizes codex, codex.js, @openai/codex
  • descriptor_looks_like_codex rejects vscodex, not-codex-tool, empty strings
  • descriptor_looks_like_codex handles path separators (/usr/local/bin/codex)
  • pid_holding_port returns None when port is free (also covers lsof-missing case)
  • preflight_codex_oauth_port returns PortFree when 1455 is unbound (auto-skips if 1455 is occupied — no flake)

Plus full crate test suite: 856 unit + 4 + 11 = 871 tests, all green. No regressions.

Manual verification

  • coven adapter login --help shows the new subcommand
  • ✅ Preflight with port FREE: reports "free" and exec's codex
  • ✅ Preflight with port HELD by non-codex (python): refuses to kill, prints lsof instructions
  • cargo clippy -- -D warnings clean
  • cargo fmt --check clean

What it looks like in practice

$ coven adapter login codex
Coven codex login
  Preflight: checking port 1455... freed (killed stale codex pid 12345)
  Launching `codex login`...

  > Sign in via browser or paste your existing token...

Not in scope (separate follow-ups)

  • A shim for claude login — Anthropic CLI doesn't have this failure mode (uses bearer-token paste, no local callback server)
  • Surfacing the preflight in cave's UI — this PR ships the CLI handler; cave can wrap it later if useful

Once this lands, you can rebuild + install with the usual cargo install --path crates/coven-cli or wait for the next coven release stamp.

When Codex's `codex login` flow gets killed mid-OAuth (esc, SIGINT, browser
close, network blip) the local callback server on port 1455 sometimes leaks
as an orphan process. The next `codex login` attempt then crashes with:

  Codex OAuth failed: Failed to bind port 1455: Address already in use (os error 98/48)

Upstream Codex has no auto-recover or fallback-port logic. Until that gets
fixed in @openai/codex itself, this shim handles it from our side.

## What

New subcommand: `coven adapter login <adapter>`. Today the only adapter
that needs special handling is `codex`. Pattern:

1. Try to bind 127.0.0.1:1455 ourselves
2. If `AddrInUse`, identify the holder via `lsof -ti tcp:1455`
3. Use sysinfo to fetch the holder's process descriptor
4. **Only kill it if its argv/path clearly contains the token `codex`**
   (split on whitespace + path separators — won't match substrings like
   "vscodex" or "not-codex-tool")
5. SIGTERM (1s grace), then SIGKILL if still alive
6. Wait up to 1.5s for the port to free up
7. Exec `codex login`

If we can't ID the holder (no lsof, ambiguous), or if it's clearly NOT
codex, we **refuse to kill** and print a clear instruction.

## Why this shape and not the alternatives

- **Why not auto-run in `coven doctor`?** Because users hit this error
  mid-frustration trying to log in. They don't want "go run a separate
  doctor command first" — they want one command that just works.
- **Why not change which port Codex uses?** We can't; it's hard-coded
  upstream.
- **Why `coven adapter login codex` and not `coven codex login`?** Reuses
  the existing `coven adapter ...` namespace (alongside `adapter list`,
  `adapter doctor`, `adapter install`).

## Tests

New module `crates/coven-cli/src/codex_login.rs` with 7 unit tests:

- `port_is_free` correctly identifies free vs held ports
- `descriptor_looks_like_codex` recognizes codex/codex.js/@openai/codex
- `descriptor_looks_like_codex` rejects vscodex, not-codex-tool, etc.
- `pid_holding_port` returns None when port is free (also covers
  lsof-missing case — same observable result)
- `preflight_codex_oauth_port` returns PortFree when 1455 is unbound
  (auto-skips if 1455 is occupied on the test machine — no flake)

## Manual verification

- `coven adapter login --help` shows the new subcommand
- Preflight with port FREE: correctly reports "free" and exec's codex
- Preflight with port HELD by python (non-codex): correctly refuses to
  kill, prints lsof instructions

## Not in scope

- A shim for `claude login` — Anthropic CLI doesn't have this failure
  mode (uses bearer-token paste, no local callback server)
- Surfacing the preflight in cave's UI — separate followup, this PR just
  ships the CLI handler

Closes the recurring "codex OAuth port held" pain we hit whenever a
`codex login` flow is killed mid-auth.
Copilot AI review requested due to automatic review settings June 28, 2026 19:43

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new CLI entrypoint to make Codex authentication more resilient by preflighting the fixed OAuth callback port (1455) and optionally clearing stale Codex listeners before running the vendor login flow.

Changes:

  • Introduces coven adapter login <adapter> subcommand, currently implemented for codex.
  • Adds a Codex-specific preflight module that detects port 1455 conflicts, identifies the holding PID (via lsof), and conditionally terminates it using a conservative “looks like codex” descriptor heuristic.
  • Adds unit tests for the port check and descriptor heuristic behavior.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
crates/coven-cli/src/main.rs Wires new adapter login subcommand and runs the Codex login wrapper flow.
crates/coven-cli/src/codex_login.rs Implements port-1455 preflight + conservative PID identification/termination logic with unit tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1004 to +1009
let status = Command::new("codex")
.arg("login")
.status()
.with_context(|| {
"failed to exec `codex login`. Install Codex with `npm install -g @openai/codex`."
})?;
Comment on lines +17 to +18
//! 3. Wait for the port to actually free up (poll up to ~1.5s), then exec
//! `codex login`.
Comment on lines +60 to +62
// Other errors (permission denied, etc) — treat as not free so we
// surface them later when we re-bind for real.
Err(_) => false,
/// Heuristic: does this descriptor look like a `codex` OAuth helper we can
/// safely terminate? We require the literal `codex` token to appear as a
/// path component or argv element. We deliberately do NOT match substrings
/// like `vscode` or `markdownify` that happen to contain "codex".
@BunsDev

BunsDev commented Jun 28, 2026

Copy link
Copy Markdown
Member Author

Closing — moving the fix to coven-cave's UI instead of the CLI per Val's direction. Will reopen here later if the CLI shim is also wanted.

@BunsDev BunsDev closed this Jun 28, 2026
@BunsDev BunsDev deleted the feat/codex-login-shim branch June 28, 2026 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants