Skip to content

Feature: Smart account-aware rotation in accounts.json (single canonical topic) #87

@DarKWinGTM

Description

@DarKWinGTM

Single canonical header topic (only source of truth)

All runtime/scheduler state must live inside accounts.json under each account.


Summary

Implement smart account-aware rotation that selects the best-ready account (not blind round-robin), with per-account runtime intelligence embedded directly in accounts.json.


Evidence from current code

  1. Rotation is primarily round-robin

    • new_idx = (old_idx + 1) % count
    • src/account_rotator.rs:397-400
  2. Persisted account model currently lacks readiness metadata

    • src/account_rotator.rs:34-44
  3. Standby visibility is limited in status API

    • src/api/mod.rs:519-521, src/api/mod.rs:547-551
  4. Retry/reset signals exist but client delay is clamped

    • src/api/retry_delay.rs:129-143

Required design direction

1) Embed runtime state in accounts.json only

Add optional runtime object per account in accounts.json:

{
  "accounts": [
    {
      "email": "user@example.com",
      "refresh_token": "1//...",
      "project_id": "...",
      "runtime": {
        "last_active_at": "...",
        "banned": false,
        "restricted": false,
        "next_ready_at": "...",
        "models": {
          "gemini-3-pro": {
            "remaining_fraction": 0.37,
            "reset_time": "...",
            "cooldown_until": "...",
            "last_retry_delay_secs_raw": 18840,
            "last_retry_delay_secs_client": 300,
            "consecutive_429": 1,
            "last_success_at": "...",
            "confidence": 0.82,
            "updated_at": "..."
          }
        }
      }
    }
  ],
  "active": "user@example.com"
}

2) Signal ingestion

  • Quota polling (src/quota.rs) updates remaining/reset/banned/restricted and recomputes next_ready_at.
  • Rate limiter (src/api/rate_limiter.rs) updates cooldown/retry/consecutive/success outcomes.
  • Rotation orchestrator (src/rotation.rs) persists transition metadata (from, to, at).

3) Delay semantics

Store both:

  • last_retry_delay_secs_raw (unclamped) for scheduler intelligence, including long reset windows (e.g. 5h / weekly)
  • last_retry_delay_secs_client (clamped) for API-facing behavior

4) Smart selection scorer

For requested model:

  1. Build candidates
  2. Exclude hard-ineligible (banned, restricted, active hard block)
  3. Compute model_ready_at = max(cooldown_until, reset_time_if_known)
  4. Score:
score =
  + ready_now_bonus
  + remaining_fraction_weight
  - wait_time_penalty
  - consecutive_429_penalty
  - staleness_penalty
  + recent_success_bonus
  1. Pick highest score
  2. Apply hysteresis to avoid churn

Fallback: if runtime metadata missing/corrupt/stale -> deterministic round-robin.

5) Parked recovery

When all accounts are temporarily ineligible:

  • earliest_wake_at = min(next_ready_at across accounts)
  • wake and re-evaluate at that time (+ jitter)
  • auto-unpark when any account becomes eligible

Persistence/write contract for accounts.json

Because runtime is embedded in the same file:

  1. Atomic write (tmp + rename)
  2. File locking/serialized writes
  3. 0600 permissions
  4. Debounced flush (e.g. 5-10s or significant deltas)
  5. Startup schema validation/migration

Operational note:

  • Durable smart rotation requires writable accounts.json.
  • If accounts.json is mounted read-only, system must fall back to memory-only runtime state safely (non-durable) without crash.

API observability additions

Extend /v1/accounts/status with:

  • next_ready_at
  • selection_score
  • selection_reason
  • last_known_model_state_summary
  • confidence

Rollout plan

Phase 1

  • add runtime schema in accounts.json
  • populate from quota/rate-limiter/rotation
  • expose status fields

Phase 2

  • enable scorer behind feature flag

Phase 3

  • make scorer default after burn-in

Acceptance criteria

  1. Mixed cooldown accounts -> best-ready account selected.
  2. State persists across restart when accounts.json is writable.
  3. Long windows (e.g. 5h / weekly) influence scheduling decisions.
  4. Missing/corrupt runtime metadata falls back safely to round-robin.
  5. Status API clearly explains selection rationale.

Implementation slices

  1. accounts.json schema extension (runtime per account)
  2. atomic + locked persistence path
  3. signal ingestion (quota + limiter + rotation)
  4. candidate scorer + fallback + hysteresis
  5. parked wake scheduler
  6. status API + documentation updates

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions