Skip to content

feat: ship Slice K1 — auth bootstrap fix + invalid_grant + expiring status#87

Merged
joedanz merged 1 commit into
mainfrom
slice/K1-auth-bootstrap-fix
May 6, 2026
Merged

feat: ship Slice K1 — auth bootstrap fix + invalid_grant + expiring status#87
joedanz merged 1 commit into
mainfrom
slice/K1-auth-bootstrap-fix

Conversation

@joedanz
Copy link
Copy Markdown
Owner

@joedanz joedanz commented May 6, 2026

Summary

Three bugs were keeping claude login + mc auth bootstrap from being a reliable recovery path. K1 fixes all three and adds a proactive expiry signal.

  • mc auth bootstrap always overwrites by default and prints old → new expiry diff. Earlier silent-skip-when-file-exists was a footgun: users saw "Already bootstrapped" and assumed the rotation applied — it didn't, the stale token stayed in place. --dry-run still works and shows the same diff without writing.
  • HTTP 400 invalid_grant now throws RefreshTokenRevokedError immediately instead of falling through the generic non-200 path that silently returned the cached token. Anthropic uses 400 for permanent revocation; the previous code masked it as a transient "network error" until the access token expired 5–10 min later. JSON-parsed (not regex) for a narrow surface — other 400 bodies (e.g. invalid_request) still hit the existing transient branch.
  • Proactive expiry alert at <2h via new 'expiring' RefresherStatus value (claudeclaw-os pattern). Daemon maps it to statusRegistry.setUnverified('claudeCode', …) + a WARN log so it shows up in ~/.mc/logs/ before the chat lane hangs.
  • OAuthRefresher.invalidate() exposed for future credentials-watch wiring (deferred to its own slice — existing config-watcher → respawn path still recovers correctly today).

Why this shape

  • JSON-parse the OAuth error body instead of regex. Quality reviewer caught that the regex pattern would false-positive on any 400 body containing the substring error: invalid_grant anywhere (nested fields, error descriptions). The endpoint returns well-formed JSON; matching the success path's parse approach eliminates the false-positive surface entirely.
  • Map 'expiring' to setUnverified, not a new dashboard status. "Still works, but act now" is exactly the existing 'unverified' semantic. No type-system or render-path changes needed; switch with explicit case 'disabled': plus _exhaustive: never guards future status additions at compile time.
  • Single clock() snapshot in the fast path. Efficiency reviewer noted the original called clock() twice in the cached-fresh branch — a stale-snapshot risk under non-injected clocks for what should be the same value.
  • Drop the --force flag. Dead now that overwrite is the default. Quality reviewer flagged the dead force? field on RunAuthBootstrapOpts.flags.
  • Defer the credentials-watch → invalidate() wiring. The existing config-watcher fires a full daemon respawn when ~/.mc/credentials.json mtime changes; invalidate() would skip the respawn round-trip. Held back because (a) the respawn path works today, (b) the watcher's one-shot semantics need a small redesign to support both "DB change → respawn" and "credentials change → invalidate", and (c) the user's reported bug is fully addressed by bugs 1 + 2.

Prior art reviewed

  • hilash/cabinet — shells out to the claude CLI binary, lets the CLI own auth/refresh/keychain. Zero OAuth code.
  • earlyaidopters/claudeclaw-os — reads ~/.claude/.credentials.json for monitoring only (~120 lines), never refreshes; sends a Telegram DM at <2h to expiry. K1's 'expiring' status mirrors this UX.

K3 (delete packages/core/src/auth/ entirely and adopt cabinet's pattern) is on the followups — research spike planned next week.

Test status

  • 1452 tests across 125 files, all four gates green (typecheck / build / biome / vitest)
  • 5 new tests in refresher.test.ts (RefreshTokenRevokedError on 400 invalid_grant; fall-through on 400 other; invalidate(); 'expiring' emission; EXPIRING_THRESHOLD_MS exported value)
  • auth.test.ts rewritten: silent-skip / --force cases replaced with always-overwrite + diff assertions
  • /simplify ran with 3 parallel reviewers (Reuse / Quality / Efficiency); 5 real findings, all addressed

Test plan

  • Refresher unit tests (vitest run packages/core/src/auth/refresher.test.ts)
  • Bootstrap unit tests (vitest run packages/cli/src/commands/auth.test.ts)
  • Full offline suite (pnpm test) — 1452/1452
  • Manual smoke: with a deliberately-corrupted refresh token, daemon emits RefreshTokenRevokedError at boot probe with the actionable "run claude login then mc auth bootstrap" message (verified via existing test fixtures)
  • Manual smoke: mc auth bootstrap with existing creds prints Replaced … Old expiry: … New expiry: … lines

Deferred / additive

  • K1.5 — DiscordClient.sendDirectMessage(userId, content) — claudeclaw-os does this via Telegram; MC's Discord adapter currently only has sendInChannel(channelId, content). Adding the DM-by-userId surface is its own small slice (~30 lines + tests). Wires the proactive 'expiring' alert directly to Joe's DM.
  • K2 — fail-fast in chat lane when refresher throws auth errors. Currently the SDK 401-retry loop catches RefreshTokenRejectedError and now also RefreshTokenRevokedError, but the chat-bridge doesn't yet produce a Discord-friendly "re-auth needed" reply. Holding for K2.
  • K3 — cabinet-style migration — delete packages/core/src/auth/ entirely. Research spike planned: verify whether the Claude Agent SDK can fall back to the claude CLI's auth state when CLAUDE_CODE_OAUTH_TOKEN is unset. If yes, delete; if no, weigh subprocess approach.
  • Credentials-watch → invalidate() wiring — split out as its own slice for the reasons above.

…n + expiring status

Three concrete bugs were keeping `claude login` + `mc auth bootstrap` from being a
reliable recovery path:

1. Silent no-op when ~/.mc/credentials.json existed: bootstrap printed "Already
   bootstrapped" and never overwrote, so the stale token stayed in place. Now
   always overwrites and prints the old → new expiry diff. `--dry-run` still
   works and shows the same diff without writing.

2. HTTP 400 `invalid_grant` was treated as transient: the !response.ok branch
   silently returned the cached token if still valid, hiding permanent
   refresh-token revocation until the access token expired and surfaced as a
   generic "network error". New `RefreshTokenRevokedError` throws immediately
   on 400 with `error: invalid_grant` (parsed from JSON, not regex — narrower
   surface). HTTP 401 keeps throwing `RefreshTokenRejectedError` unchanged.

3. No proactive heads-up before expiry: operators only learned auth was about
   to break when the chat lane hung. New `'expiring'` `RefresherStatus`
   emitted when cached token has under 2h remaining (claudeclaw-os pattern).
   Daemon maps it to `statusRegistry.setUnverified('claudeCode', ...)` plus a
   WARN log.

Also exposes `OAuthRefresher.invalidate()` for future credentials-watch wiring
(deferred to its own slice — the existing config-watcher → respawn path still
recovers correctly today).

Notable code:
- `refresher.ts:33–45` — `RefreshTokenRevokedError` class
- `refresher.ts:181–193` — `isInvalidGrantBody()` JSON parse
- `refresher.ts:128–139` — single `clock()` snapshot for the fast-path
  expiring/ready emission
- `auth.ts:73–98` — `printBootstrapResult()` flattens dry-run/write print paths
- `mcd-main.ts:312–332` — switch with exhaustiveness guard for RefresherStatus

Test status: 1452/1452 across 125 files, all four gates green
(typecheck/build/biome/vitest). 5 new tests in refresher.test.ts cover the new
error class, invalidate(), and the 'expiring' status emission. auth.test.ts
rewritten to assert the new always-overwrite + diff output behavior.

Deferred:
- K1.5 — `DiscordClient.sendDirectMessage(userId, content)` so the proactive
  expiry alert reaches the owner directly (claudeclaw-os does this via
  Telegram; MC's Discord adapter currently only has channel send).
- K3 — research spike on cabinet-style migration (delete the entire
  packages/core/src/auth/ tree and shell out to `claude` CLI). Both
  hilash/cabinet and earlyaidopters/claudeclaw-os reviewed; both delete
  100% of MC's OAuth refresh code.
@joedanz joedanz merged commit 0cc82d2 into main May 6, 2026
3 checks passed
@joedanz joedanz deleted the slice/K1-auth-bootstrap-fix branch May 6, 2026 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant