Skip to content

Fix provider reconnect to open the OAuth flow instead of failing#472

Merged
Th0rgal merged 8 commits into
masterfrom
fix-provider-oauth-reconnect
May 30, 2026
Merged

Fix provider reconnect to open the OAuth flow instead of failing#472
Th0rgal merged 8 commits into
masterfrom
fix-provider-oauth-reconnect

Conversation

@Th0rgal
Copy link
Copy Markdown
Owner

@Th0rgal Th0rgal commented May 30, 2026

Problem

On settings/providers, clicking Reconnect for an OAuth provider (xAI/Grok, Anthropic) with an expired/revoked token produced:

Re-authenticated, but provider check still fails: xAI OAuth token expired; reconnect Grok Build

instead of opening the OAuth link.

Root cause

The Reconnect button calls POST /api/ai/providers/:id/auth, whose only check is has_credentials() — which returns true whenever an oauth blob merely exists, even if expired/revoked. So the endpoint returned {success: true, auth_url: null}, the frontend skipped opening the link, ran a live usage probe, and surfaced the probe's failure.

Fix

Frontend

  • New ReconnectProviderModal that drives the real oauthAuthorize → confirm-code → oauthCallback flow already used by the add-provider modal.
  • OAuth-backed providers (uses_oauth && !has_api_key) route to it; API-key providers keep the legacy path.
  • Method indices are pinned to the backend ProviderType::auth_methods() ordering (Anthropic Pro/Max vs console mode resolves correctly); single-method providers (xAI) auto-start.
  • Post-auth health probe factored into a shared helper.

Backend

  • oauth_callback updates the existing provider in place when the path id is a known UUID (what Reconnect sends), instead of always inserting a new row — prevents duplicate provider entries on reconnect. The add-provider flow passes a type id (not a UUID) and still falls through to add().

Testing

Deployed to the dev backend and verified against the live xAI provider stuck in needs_reauth:

Path Endpoint Result
Old (bug) POST /:id/auth {"success":true,...,"auth_url":null}
New (fix) POST /:id/oauth/authorize xAI → accounts.x.ai/oauth2/device?user_code=…; Anthropic → claude.ai/oauth/authorize (Pro/Max) and console.anthropic.com/oauth/authorize (API key)
  • cargo check + cargo fmt --all --check clean; backend builds on Linux, deploys to dev, service healthy.
  • tsc --noEmit + eslint clean; full Next.js production build passes with /settings/providers present.
  • No OAuth callback was completed during testing, so no credentials/provider state changed.

Note: the literal button click and the duplicate-row dedup end-to-end were not automated (browser auth gate / would mint real tokens).


Note

Medium Risk
Touches OAuth credential persistence and in-place provider updates (auth-critical), plus Anthropic request rewriting and session reset behavior on the inference path.

Overview
Reconnect on the providers settings page now runs the real OAuth authorize → callback path for OAuth-only providers (xAI, Anthropic, etc.) instead of POST …/auth, which treated expired tokens as “authenticated” and never opened the auth link. A dedicated ReconnectProviderModal mirrors the add-provider OAuth UX (method indices aligned with the backend); post-reconnect health probing is shared via probeProviderHealth.

On the backend, OAuth callback accepts the provider UUID from reconnect and updates that row in place (including xAI Grok upsert by target_id), avoiding duplicate provider rows. API-key reconnect still uses the legacy auth endpoint.

Separately, the Anthropic proxy and mission runner gain handling for stale extended-thinking blocks when the model changes or blocks are replayed: strip thinking on model rewrite, preserve thinking in the OpenAI→Anthropic adapter, one-shot retry with thinking disabled after the specific 400, and Claude Code transport recovery that resets to a fresh session instead of resuming when that error appears in turn output.

Reviewed by Cursor Bugbot for commit 51e62f1. Bugbot is set up for automated code reviews on this repo. Configure here.

The Reconnect button on settings/providers called POST /:id/auth, whose
only check is has_credentials() — which is true whenever an oauth blob
merely exists, even if the token is expired or revoked. For OAuth
providers (xAI/Grok, Anthropic) this returned success without an
auth_url, so the frontend skipped the OAuth link, ran a live usage
probe, and surfaced "Re-authenticated, but provider check still fails:
xAI OAuth token expired…".

Frontend: route OAuth-backed providers (uses_oauth && !has_api_key) to a
new ReconnectProviderModal that drives the real oauthAuthorize ->
confirm-code -> oauthCallback flow already used by the add-provider
modal. Method indices are pinned to ProviderType::auth_methods() so
Anthropic's Pro/Max vs console mode resolves correctly; single-method
providers (xAI) auto-start. API-key providers keep the legacy path. The
post-auth health probe is factored into a shared helper.

Backend: oauth_callback now updates the existing provider in place when
the path id is a known UUID (what Reconnect sends) instead of always
inserting a new row, preventing duplicate provider entries on reconnect.
The add-provider flow passes a type id (not a UUID) and still falls
through to add().
@vercel
Copy link
Copy Markdown

vercel Bot commented May 30, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
sandboxed-dashboard Ready Ready Preview, Comment May 30, 2026 2:03pm
sandboxed-sh Ready Ready Preview, Comment May 30, 2026 2:03pm

Request Review

Comment thread src/api/ai_providers.rs
Comment thread dashboard/src/components/ui/reconnect-provider-modal.tsx
Comment thread dashboard/src/components/ui/reconnect-provider-modal.tsx
Anthropic binds `thinking`/`redacted_thinking` signatures to the exact
model that produced them. The proxy rewrites `model` on every forwarded
request (fallback chains, default-model changes), so continuing a
conversation after the model changed replayed thinking blocks signed by
the old model — Anthropic rejected it with:

  "`thinking` or `redacted_thinking` blocks in the latest assistant
   message cannot be modified. These blocks must remain as they were in
   the original response."

(Surfaced after switching the default to claude-opus-4-8 while missions
started under opus-4-6/4-7 were resumed.)

Fixes:
- add strip_thinking_blocks(): drop thinking/redacted_thinking from
  assistant turns, never producing an empty content array
- rewrite_model_for_anthropic_cli_proxy: when the rewritten model differs
  from the original request model, strip stale thinking before forwarding
- build_anthropic_upstream_request (OpenAI->Anthropic adapter): same
  model-change strip
- anthropic_content_blocks_from_openai: preserve thinking/redacted_thinking
  (text + signature) instead of silently dropping them, so same-model
  replays keep working

Adds unit tests for strip-on-change, keep-on-same-model, and block preservation.
Comment thread src/api/proxy.rs
@Th0rgal
Copy link
Copy Markdown
Owner Author

Th0rgal commented May 30, 2026

Added commit 05e58c34 (proxy: strip stale thinking/redacted_thinking blocks when the request model changes) onto this branch at Thomas's request, so it ships together with the provider-reconnect fix. It's an isolated change to src/api/proxy.rs only (no overlap with the OAuth-reconnect files). Fixes the Anthropic 400 "thinking blocks ... cannot be modified" that surfaced when resuming missions after the default model switched to claude-opus-4-8. Deployed to prod from this branch (commit 05e58c3).

- proxy: strip_thinking_blocks now drops thinking from a thinking-only
  assistant turn too, substituting a placeholder text block (the previous
  guard left stale cross-model thinking on such turns -> Anthropic 400)
- ai_providers: oauth_callback resolves the provider type via the store
  when reconnect passes a UUID, so the row's credentials are actually
  refreshed instead of keeping expired tokens
- reconnect modal: guard oauthAuthorize against stale/late responses via a
  monotonic request token (close/switch supersedes in-flight requests)
- reconnect modal: drop the premature success toast; handleReconnectSuccess
  now owns the success/failure message after the usage probe, so users no
  longer see "reconnected" + "check still fails" for one action

Adds a proxy unit test for the thinking-only model-switch case.
Comment thread src/api/ai_providers.rs
Comment thread src/api/proxy.rs
The model-rewrite strip only covers in-request model changes, but missions
can carry thinking blocks in stored history that were produced under an
earlier model while the current request already matches the chain model.
Those replays still get rejected by Anthropic with "thinking ... blocks ...
cannot be modified".

Add a reactive recovery in the proxy chain loop: on a 400 from an Anthropic
adapter (OAuth CLI-proxy or direct), if the error body is the stale-thinking
rejection, strip all thinking/redacted_thinking from the request, disable
extended thinking for that turn, and retry once against the same upstream.
Non-thinking 400s and non-Anthropic providers are unaffected.

- anthropic_error_is_stale_thinking(): classify the 400 body
- anthropic_body_drop_thinking_and_disable(): strip + set thinking disabled
- guarded inline retry in the chain loop (mutable upstream_resp/status)
- unit tests for detection and strip/disable
Claude Code's LLM calls go through the external cli-proxy, so the proxy-side
thinking strip/retry never sees them. When a resumed claudecode mission
replays a session transcript whose thinking blocks were signed under a
different model, Anthropic returns 400 "thinking ... cannot be modified" and
the mission hard-fails.

Route that error into the existing ResetSessionFresh transport-recovery path:
- is_stale_thinking_error(): detect the rejection in the turn output
- claudecode_transport_recovery_strategy: on stale-thinking, escalate
  straight to a fresh session (skip same-session resume, which would replay
  the same rejected blocks); the existing reset path rebuilds context as text
  and drops the signed thinking, so the turn succeeds.

Adds a unit test.
Comment thread src/api/ai_providers.rs
Comment thread src/api/ai_providers.rs
- oauth_callback (UUID reconnect): don't clobber stored api_key/oauth with
  None when the callback produced no fresh credentials (e.g. a failed
  auth.json sync that still reported success) — only replace when fresh
  creds were actually extracted.
- oauth_callback (UUID reconnect): never fall through to `add` when an
  existing UUID was targeted; a missing row or failed update now returns an
  explicit 404/500 instead of inserting a duplicate account for the same
  OAuth completion.
- upsert_grok_oauth_provider: accept a target_id and prefer that row, so an
  xAI reconnect updates the clicked row (which the health probe checks)
  instead of the first enabled OAuth xAI account.
@Th0rgal Th0rgal merged commit f41c069 into master May 30, 2026
7 of 10 checks passed
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 51e62f1. Configure here.

Comment thread src/api/proxy.rs
chain_length,
});
client_error_count += 1;
continue;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLI proxy 400s trigger cooldown

Medium Severity

For Anthropic OAuth CLI-proxy routing, any HTTP 400 that is not classified as stale-thinking now calls record_entry_failure with ClientError and skips the rest of the chain entry. That path used to fall through to the generic 4xx handler, which intentionally avoids cooldowns. Unrelated validation 400s can temporarily sideline otherwise healthy OAuth accounts.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 51e62f1. Configure here.

Comment thread src/api/proxy.rs
} else {
build_anthropic_upstream_request(&body, &entry.model_id, is_stream)
};
base.and_then(|b| anthropic_body_drop_thinking_and_disable(&b))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLI proxy retry wrong body format

Medium Severity

On a stale-thinking HTTP 400, the OAuth CLI-proxy branch builds a retry from rewrite_model_for_anthropic_cli_proxy (OpenAI /v1/chat/completions JSON) but then passes it through anthropic_body_drop_thinking_and_disable, which injects Anthropic Messages API fields such as top-level thinking. That retry is posted back to the CLI proxy, so the recovery path for Anthropic OAuth CLI routing cannot reliably fix stale thinking blocks.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 51e62f1. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant