feat: smart model routing with priority selection and provider fallback by hippoley · Pull Request #96 · Yuan-lab-LLM/ClawManager

hippoley · 2026-04-27T15:05:30Z

Summary

Addresses Issue #68 (problems 1, 2, and 3) — the AI Gateway's model routing was effectively a stub: users could only see "Auto", auto-selection picked the first model blindly, and provider failures returned errors with no retry.

This PR delivers a complete Smart Model Routing layer:

Problem 1: Users can now see and select specific models

ListAvailableModels() previously hardcoded a single "Auto" entry. It now returns Auto + all active models with their real provider types, so the frontend can offer a model picker.

Problem 2: Priority-based auto selection with load balancing

Added priority INT NOT NULL DEFAULT 0 column to llm_models (idempotent ALTER TABLE with duplicate-column guard).
selectAutoModel() now filters non-secure models, sorts by priority (descending, from DB ordering), and randomly picks among the highest-priority group for simple load balancing.
Falls back to secure models when no non-secure candidates remain.
ListActive() and List() ordering updated to -priority, -is_secure, display_name.

Problem 3: Provider fallback on failure

Non-streaming path:

New dispatchCall() extracts provider routing from ChatCompletions().
New callWithFallback() retries with alternate models on connection error or 5xx response, up to 2 retries.
Each fallback attempt records an audit event (gateway.request.fallback) for observability.

Streaming path:

New dispatchStream() extracts provider routing from StreamChatCompletions().
New streamWithFallback() retries only on connection-level failure (before any response headers are written to the client), up to 2 retries.
New errProviderConnection sentinel type distinguishes retriable connection failures from committed-response errors.
streamOpenAICompatible() and streamAnthropic() now return errProviderConnection on httpClient.Do() failure.

Design decisions

Streaming fallback is intentionally narrow: once response headers are written, the HTTP response is committed and cannot be retried. Only pre-header connection failures trigger streaming fallback.
4xx errors never trigger fallback — they indicate client-side issues (bad request, auth failure) that would fail on any model.
Priority field defaults to 0, so existing deployments work unchanged — all models have equal priority and get random selection (load balancing).

Files changed

File	Change
`models/llm_model.go`	Added `Priority int` field
`repository/llm_model_repository.go`	Schema migration + ordering update
`aigateway/service.go`	Core routing: ListAvailableModels, selectAutoModelExcluding, dispatchCall, callWithFallback, dispatchStream, streamWithFallback, errProviderConnection
`aigateway/service_test.go`	9 new tests

Tests

9 new tests covering:

Model listing (Auto + all active; empty when no active)
Priority selection (highest priority wins)
Secure model filtering (skip when non-secure available; fallback when not)
Load distribution (random among same-priority group, verified over 300 iterations)
Exclusion logic (for fallback candidate selection)
Error type detection (errProviderConnection via errors.As)

All existing tests continue to pass (zero regression).

Closes #68 (problems 1, 2, 3). Problems 4-8 (instance-level routing, cost-aware routing, request-feature routing) are deferred to future PRs.

Addresses Issue Yuan-lab-LLM#68 (problems 1, 2, 3): 1. ListAvailableModels now returns Auto + all active models so users can see and select specific models instead of only 'Auto'. 2. selectAutoModel uses a priority field (INT, higher = preferred) with random tie-breaking among the highest-priority group for simple load balancing. Falls back to secure models when no non-secure candidates remain. 3. Provider fallback on failure: - Non-streaming: retries with alternate models on connection error or 5xx response (max 2 retries via callWithFallback). - Streaming: retries only on connection-level failure before any response headers are written (max 2 retries via streamWithFallback). - Each fallback attempt records an audit event for observability. Schema: adds 'priority' column to llm_models (INT NOT NULL DEFAULT 0) with idempotent ALTER TABLE guarded by duplicate-column-name check. New errProviderConnection sentinel type distinguishes retriable connection failures from committed-response errors in the streaming path. Tests: 9 new tests covering model listing, priority selection, load distribution, exclusion logic, and error type detection.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: smart model routing with priority selection and provider fallback#96

feat: smart model routing with priority selection and provider fallback#96
hippoley wants to merge 1 commit intoYuan-lab-LLM:mainfrom
hippoley:feat/smart-model-routing

hippoley commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hippoley commented Apr 27, 2026

Summary

Problem 1: Users can now see and select specific models

Problem 2: Priority-based auto selection with load balancing

Problem 3: Provider fallback on failure

Design decisions

Files changed

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant