Skip to content

feat: smart model routing with priority selection and provider fallback#96

Open
hippoley wants to merge 1 commit intoYuan-lab-LLM:mainfrom
hippoley:feat/smart-model-routing
Open

feat: smart model routing with priority selection and provider fallback#96
hippoley wants to merge 1 commit intoYuan-lab-LLM:mainfrom
hippoley:feat/smart-model-routing

Conversation

@hippoley
Copy link
Copy Markdown
Contributor

Summary

Addresses Issue #68 (problems 1, 2, and 3) — the AI Gateway's model routing was effectively a stub: users could only see "Auto", auto-selection picked the first model blindly, and provider failures returned errors with no retry.

This PR delivers a complete Smart Model Routing layer:

Problem 1: Users can now see and select specific models

ListAvailableModels() previously hardcoded a single "Auto" entry. It now returns Auto + all active models with their real provider types, so the frontend can offer a model picker.

Problem 2: Priority-based auto selection with load balancing

  • Added priority INT NOT NULL DEFAULT 0 column to llm_models (idempotent ALTER TABLE with duplicate-column guard).
  • selectAutoModel() now filters non-secure models, sorts by priority (descending, from DB ordering), and randomly picks among the highest-priority group for simple load balancing.
  • Falls back to secure models when no non-secure candidates remain.
  • ListActive() and List() ordering updated to -priority, -is_secure, display_name.

Problem 3: Provider fallback on failure

Non-streaming path:

  • New dispatchCall() extracts provider routing from ChatCompletions().
  • New callWithFallback() retries with alternate models on connection error or 5xx response, up to 2 retries.
  • Each fallback attempt records an audit event (gateway.request.fallback) for observability.

Streaming path:

  • New dispatchStream() extracts provider routing from StreamChatCompletions().
  • New streamWithFallback() retries only on connection-level failure (before any response headers are written to the client), up to 2 retries.
  • New errProviderConnection sentinel type distinguishes retriable connection failures from committed-response errors.
  • streamOpenAICompatible() and streamAnthropic() now return errProviderConnection on httpClient.Do() failure.

Design decisions

  • Streaming fallback is intentionally narrow: once response headers are written, the HTTP response is committed and cannot be retried. Only pre-header connection failures trigger streaming fallback.
  • 4xx errors never trigger fallback — they indicate client-side issues (bad request, auth failure) that would fail on any model.
  • Priority field defaults to 0, so existing deployments work unchanged — all models have equal priority and get random selection (load balancing).

Files changed

File Change
models/llm_model.go Added Priority int field
repository/llm_model_repository.go Schema migration + ordering update
aigateway/service.go Core routing: ListAvailableModels, selectAutoModelExcluding, dispatchCall, callWithFallback, dispatchStream, streamWithFallback, errProviderConnection
aigateway/service_test.go 9 new tests

Tests

9 new tests covering:

  • Model listing (Auto + all active; empty when no active)
  • Priority selection (highest priority wins)
  • Secure model filtering (skip when non-secure available; fallback when not)
  • Load distribution (random among same-priority group, verified over 300 iterations)
  • Exclusion logic (for fallback candidate selection)
  • Error type detection (errProviderConnection via errors.As)

All existing tests continue to pass (zero regression).

Closes #68 (problems 1, 2, 3). Problems 4-8 (instance-level routing, cost-aware routing, request-feature routing) are deferred to future PRs.

Addresses Issue Yuan-lab-LLM#68 (problems 1, 2, 3):

1. ListAvailableModels now returns Auto + all active models so users can
   see and select specific models instead of only 'Auto'.

2. selectAutoModel uses a priority field (INT, higher = preferred) with
   random tie-breaking among the highest-priority group for simple load
   balancing. Falls back to secure models when no non-secure candidates
   remain.

3. Provider fallback on failure:
   - Non-streaming: retries with alternate models on connection error or
     5xx response (max 2 retries via callWithFallback).
   - Streaming: retries only on connection-level failure before any
     response headers are written (max 2 retries via streamWithFallback).
   - Each fallback attempt records an audit event for observability.

Schema: adds 'priority' column to llm_models (INT NOT NULL DEFAULT 0)
with idempotent ALTER TABLE guarded by duplicate-column-name check.

New errProviderConnection sentinel type distinguishes retriable connection
failures from committed-response errors in the streaming path.

Tests: 9 new tests covering model listing, priority selection, load
distribution, exclusion logic, and error type detection.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

模型路由过于简短粗暴

1 participant