Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 22 additions & 3 deletions DOCUMENTATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,11 +80,11 @@ The logic applies in the following order:
2. **Blacklist Check**: For any model *not* on the whitelist, the client checks the blacklist (`IGNORE_MODELS_<PROVIDER>`). If the model matches a blacklist pattern (supports wildcards like `*-preview`), it is excluded.
3. **Default**: If a model is on neither list, it is included.

#### Per-Model Routing Overrides (v1)
#### Per-Model Routing Overrides

`MODEL_ROUTING_OVERRIDES` lets operators rewrite `weighted-router/<model>` aliases into a concrete provider-prefixed model before provider lock-in. v1 supports only strict `single` routes so retry, cooldown, and credential rotation continue to run inside one provider lane.
`MODEL_ROUTING_OVERRIDES` lets operators rewrite `weighted-router/<model>` aliases into a concrete provider-prefixed model before provider lock-in. Supported strategies are `single` and `weighted`, so retry, cooldown, and credential rotation continue to run inside the chosen provider lane.

In v1, `allowed_providers` must contain only the primary provider and `fallback_providers` must remain empty.
For `single`, `allowed_providers` must contain only the primary provider and `fallback_providers` must remain empty.

Example:

Expand All @@ -104,6 +104,25 @@ MODEL_ROUTING_OVERRIDES='{

This rewrites `weighted-router/nemotron-3-super` to `ollama/nemotron-3-super`. Invalid override config fails at startup, and unmatched `weighted-router/*` models fail closed instead of silently falling back to another provider.

Weighted overrides let a model stay on a strict allowlist while excluding a provider entirely:

```bash
MODEL_ROUTING_OVERRIDES='{
"qwen3.5": {
"strategy": "weighted",
"allowed_providers": ["ollama", "chutes"],
"weights": {"ollama": 80, "chutes": 20},
"excluded_providers": ["opencode_go"],
"fallback_providers": [],
"strict": true,
"allow_global_fallback": false,
"reason": "Keep qwen3.5 off opencode_go"
}
}'
```

This selects only `ollama/qwen3.5` or `chutes/qwen3.5`. Invalid weights, unknown providers, excluded/allowed overlaps, and any attempt to enable global fallback fail at startup.

#### Request Lifecycle: A Deadline-Driven Approach

The request lifecycle has been designed around a single, authoritative time budget to ensure predictable performance:
Expand Down
25 changes: 22 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -493,11 +493,11 @@ The proxy includes a powerful text-based UI for configuration and management.
</details>

<details>
<summary><b>Weighted Router Per-Model Overrides (v1)</b></summary>
<summary><b>Weighted Router Per-Model Overrides</b></summary>

Use `MODEL_ROUTING_OVERRIDES` to pin a `weighted-router/<model>` alias to a single provider before credential selection begins. v1 supports only the `single` strategy and fails closed if a matching override is missing or invalid.
Use `MODEL_ROUTING_OVERRIDES` to rewrite a `weighted-router/<model>` alias before credential selection begins. Supported strategies are `single` and `weighted`, and unmatched `weighted-router/*` models fail closed.

In v1, `allowed_providers` must contain only the primary provider and `fallback_providers` must stay empty.
For `single`, `allowed_providers` must contain only the primary provider and `fallback_providers` must stay empty.

```bash
export MODEL_ROUTING_OVERRIDES='{
Expand All @@ -515,6 +515,25 @@ export MODEL_ROUTING_OVERRIDES='{

With that configuration, a request for `weighted-router/nemotron-3-super` is rewritten to `ollama/nemotron-3-super` before the normal retry and credential rotation flow runs.

Weighted overrides can keep a model on an explicit allowlist while excluding a provider entirely:

```bash
export MODEL_ROUTING_OVERRIDES='{
"qwen3.5": {
"strategy": "weighted",
"allowed_providers": ["ollama", "chutes"],
"weights": {"ollama": 80, "chutes": 20},
"excluded_providers": ["opencode_go"],
"fallback_providers": [],
"strict": true,
"allow_global_fallback": false,
"reason": "Keep qwen3.5 off opencode_go"
}
}'
```

That configuration selects either `ollama/qwen3.5` or `chutes/qwen3.5` and never falls through to `opencode_go/qwen3.5`.

</details>

<details>
Expand Down
4 changes: 3 additions & 1 deletion src/rotator_library/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -900,6 +900,7 @@ def _build_routing_policy(self) -> Optional[RoutingPolicy]:
model_overrides=self.model_routing_overrides,
available_providers=self.all_credentials.keys(),
provider_models=provider_models,
known_providers=self._provider_plugins.keys(),
)
lib_logger.info(
"Loaded %d model routing override(s)",
Expand All @@ -919,7 +920,7 @@ def _log_route_decision(self, decision: Optional[RouteDecision]) -> None:
return

lib_logger.info(
"Route decision: requested_model=%s rewritten_model=%s selected_provider=%s strategy=%s selection_source=%s strict=%s allow_global_fallback=%s candidate_providers=%s reason=%s",
"Route decision: requested_model=%s rewritten_model=%s selected_provider=%s strategy=%s selection_source=%s strict=%s allow_global_fallback=%s candidate_providers=%s excluded_providers=%s reason=%s",
decision.requested_model,
decision.rewritten_model,
decision.selected_provider,
Expand All @@ -928,6 +929,7 @@ def _log_route_decision(self, decision: Optional[RouteDecision]) -> None:
decision.strict,
decision.allow_global_fallback,
decision.candidate_providers,
decision.excluded_providers,
decision.reason,
)

Expand Down
166 changes: 149 additions & 17 deletions src/rotator_library/routing_policy.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from __future__ import annotations

from dataclasses import dataclass
import random
from typing import Any, Dict, Iterable, Optional, Set


Expand All @@ -20,14 +21,14 @@ class RouteDecision:
candidate_providers: list[str]
strict: bool
allow_global_fallback: bool
excluded_providers: list[str]
reason: Optional[str] = None


class RoutingPolicy:
"""Resolve weighted-router models into concrete provider-prefixed models.

v1 intentionally supports only strict single-provider overrides. It rewrites
abstract `weighted-router/<model>` requests before provider lock-in so the
Weighted-router aliases are rewritten before provider lock-in so the
existing retry and credential machinery can continue unchanged.
"""

Expand All @@ -36,15 +37,113 @@ def __init__(
model_overrides: Dict[str, Any],
available_providers: Iterable[str],
provider_models: Optional[Dict[str, Set[str]]] = None,
known_providers: Optional[Iterable[str]] = None,
rng: Optional[random.Random] = None,
) -> None:
if not isinstance(model_overrides, dict):
raise RoutingPolicyError("MODEL_ROUTING_OVERRIDES must decode to an object")

self.model_overrides = model_overrides
self.available_providers = set(available_providers)
self.known_providers = set(known_providers or self.available_providers)
self.provider_models = provider_models or {}
self.rng = rng or random.Random()
self._validate()

def _validate_provider_model(self, provider: str, clean_model: str) -> None:
provider_models = self.provider_models.get(provider)
if provider_models and clean_model not in provider_models:
raise RoutingPolicyError(
f"provider '{provider}' does not expose model '{clean_model}' in configured model definitions"
)

def _validate_provider_name(self, provider: str, clean_model: str, field_name: str) -> None:
if provider not in self.known_providers:
raise RoutingPolicyError(
f"routing override for '{clean_model}' references unknown provider '{provider}' in '{field_name}'"
)

def _validate_weighted_override(self, clean_model: str, override: Dict[str, Any]) -> None:
weights = override.get("weights")
if not isinstance(weights, dict) or not weights:
raise RoutingPolicyError(
f"routing override for '{clean_model}' must define a non-empty 'weights' object in v2"
)

allowed_providers = override.get("allowed_providers")
if not isinstance(allowed_providers, list) or not allowed_providers:
raise RoutingPolicyError(
f"routing override for '{clean_model}' must define a non-empty 'allowed_providers' list in v2"
)
if len(set(allowed_providers)) != len(allowed_providers):
raise RoutingPolicyError(
f"routing override for '{clean_model}' cannot repeat providers in 'allowed_providers'"
)

excluded_providers = override.get("excluded_providers", [])
if not isinstance(excluded_providers, list):
raise RoutingPolicyError(
f"routing override for '{clean_model}' must use a list for 'excluded_providers'"
)

if override.get("allow_global_fallback", False):
raise RoutingPolicyError(
f"routing override for '{clean_model}' cannot enable 'allow_global_fallback' in v2"
)

fallback_providers = override.get("fallback_providers", [])
if fallback_providers not in (None, []):
raise RoutingPolicyError(
f"routing override for '{clean_model}' cannot define 'fallback_providers' in v2"
)

for provider in allowed_providers:
if not isinstance(provider, str) or not provider:
raise RoutingPolicyError(
f"routing override for '{clean_model}' must use string providers in 'allowed_providers'"
)
self._validate_provider_name(provider, clean_model, "allowed_providers")

for provider in excluded_providers:
if not isinstance(provider, str) or not provider:
raise RoutingPolicyError(
f"routing override for '{clean_model}' must use string providers in 'excluded_providers'"
)
self._validate_provider_name(provider, clean_model, "excluded_providers")

if set(allowed_providers) & set(excluded_providers):
raise RoutingPolicyError(
f"routing override for '{clean_model}' cannot include the same provider in both 'allowed_providers' and 'excluded_providers'"
)

if set(weights.keys()) != set(allowed_providers):
raise RoutingPolicyError(
f"routing override for '{clean_model}' must use matching providers in 'weights' and 'allowed_providers'"
)

total_weight = 0.0
for provider, weight in weights.items():
if not isinstance(provider, str) or not provider:
raise RoutingPolicyError(
f"routing override for '{clean_model}' must use string providers in 'weights'"
)
self._validate_provider_name(provider, clean_model, "weights")
if not isinstance(weight, (int, float)) or isinstance(weight, bool):
raise RoutingPolicyError(
f"routing override for '{clean_model}' must use numeric weights"
)
if weight < 0:
raise RoutingPolicyError(
f"routing override for '{clean_model}' cannot use negative weights"
)
total_weight += float(weight)
self._validate_provider_model(provider, clean_model)

if total_weight <= 0:
raise RoutingPolicyError(
f"routing override for '{clean_model}' must define weights with a total greater than zero"
)

def _validate(self) -> None:
for clean_model, override in self.model_overrides.items():
if not isinstance(clean_model, str) or not clean_model:
Expand All @@ -53,20 +152,21 @@ def _validate(self) -> None:
raise RoutingPolicyError(f"routing override for '{clean_model}' must be an object")

strategy = override.get("strategy")
if strategy != "single":
if strategy not in {"single", "weighted"}:
raise RoutingPolicyError(
f"routing override for '{clean_model}' must use strategy 'single' in v1"
f"routing override for '{clean_model}' must use strategy 'single' or 'weighted'"
)

if strategy == "weighted":
self._validate_weighted_override(clean_model, override)
continue

primary = override.get("primary")
if not isinstance(primary, str) or not primary:
raise RoutingPolicyError(
f"routing override for '{clean_model}' requires a non-empty 'primary' provider"
)
if primary not in self.available_providers:
raise RoutingPolicyError(
f"routing override for '{clean_model}' references unknown provider '{primary}'"
)
self._validate_provider_name(primary, clean_model, "primary")

allowed_providers = override.get("allowed_providers", [primary])
if not isinstance(allowed_providers, list) or not all(
Expand All @@ -86,11 +186,29 @@ def _validate(self) -> None:
f"routing override for '{clean_model}' cannot define 'fallback_providers' in v1"
)

provider_models = self.provider_models.get(primary)
if provider_models and clean_model not in provider_models:
raise RoutingPolicyError(
f"provider '{primary}' does not expose model '{clean_model}' in configured model definitions"
)
self._validate_provider_model(primary, clean_model)

def _select_weighted_provider(self, clean_model: str, weights: Dict[str, Any]) -> str:
total_weight = sum(float(weight) for weight in weights.values())
if total_weight <= 0:
raise RoutingPolicyError(
f"routing override for '{clean_model}' must define weights with a total greater than zero"
)

target = self.rng.uniform(0, total_weight)
running_total = 0.0
last_provider = None
for provider, weight in weights.items():
running_total += float(weight)
last_provider = provider
if target <= running_total:
return provider

if last_provider is None:
raise RoutingPolicyError(
f"routing override for '{clean_model}' produced no selectable providers"
)
return last_provider

def resolve(self, model: str) -> RouteDecision:
if "/" not in model:
Expand All @@ -105,6 +223,7 @@ def resolve(self, model: str) -> RouteDecision:
candidate_providers=[],
strict=False,
allow_global_fallback=True,
excluded_providers=[],
)

provider, clean_model = model.split("/", 1)
Expand All @@ -120,6 +239,7 @@ def resolve(self, model: str) -> RouteDecision:
candidate_providers=[provider],
strict=False,
allow_global_fallback=True,
excluded_providers=[],
)

override = self.model_overrides.get(clean_model)
Expand All @@ -128,17 +248,29 @@ def resolve(self, model: str) -> RouteDecision:
f"No routing override configured for weighted-router model '{clean_model}'"
)

selected_provider = override["primary"]
strategy = override["strategy"]
if strategy == "weighted":
selected_provider = self._select_weighted_provider(clean_model, override["weights"])
candidate_providers = list(override["allowed_providers"])
excluded_providers = list(override.get("excluded_providers", []))
selection_source = "model_override_weighted"
else:
selected_provider = override["primary"]
candidate_providers = [selected_provider]
excluded_providers = []
selection_source = "model_override"

return RouteDecision(
requested_model=model,
clean_model=clean_model,
selected_provider=selected_provider,
rewritten_model=f"{selected_provider}/{clean_model}",
strategy="single",
selection_source="model_override",
strategy=strategy,
selection_source=selection_source,
override_applied=True,
candidate_providers=[selected_provider],
candidate_providers=candidate_providers,
strict=bool(override.get("strict", True)),
allow_global_fallback=bool(override.get("allow_global_fallback", False)),
excluded_providers=excluded_providers,
reason=override.get("reason"),
)
29 changes: 29 additions & 0 deletions tests/test_client_routing_policy.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import sys
import asyncio
import random
from pathlib import Path

import pytest
Expand Down Expand Up @@ -96,3 +97,31 @@ async def fake_execute_with_retry(api_call, request=None, pre_request_callback=N

assert result == {"ok": True}
assert captured["model"] == "ollama/nemotron-3-super"


def test_client_helper_rewrites_weighted_qwen3_5_model():
client = RotatingClient.__new__(RotatingClient)
client.routing_policy = RoutingPolicy(
model_overrides={
"qwen3.5": {
"strategy": "weighted",
"allowed_providers": ["ollama", "chutes"],
"weights": {"ollama": 80, "chutes": 20},
"excluded_providers": ["opencode_go"],
"fallback_providers": [],
"strict": True,
"allow_global_fallback": False,
}
},
available_providers={"ollama", "chutes"},
provider_models={"ollama": {"qwen3.5"}, "chutes": {"qwen3.5"}},
known_providers={"ollama", "chutes", "opencode_go"},
rng=random.Random(1),
)

model, decision = client._apply_routing_policy("weighted-router/qwen3.5")

assert model in {"ollama/qwen3.5", "chutes/qwen3.5"}
assert decision is not None
assert decision.strategy == "weighted"
assert decision.excluded_providers == ["opencode_go"]
Loading