Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 24 additions & 1 deletion DOCUMENTATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,30 @@ The logic applies in the following order:
2. **Blacklist Check**: For any model *not* on the whitelist, the client checks the blacklist (`IGNORE_MODELS_<PROVIDER>`). If the model matches a blacklist pattern (supports wildcards like `*-preview`), it is excluded.
3. **Default**: If a model is on neither list, it is included.

#### Per-Model Routing Overrides (v1)

`MODEL_ROUTING_OVERRIDES` lets operators rewrite `weighted-router/<model>` aliases into a concrete provider-prefixed model before provider lock-in. v1 supports only strict `single` routes so retry, cooldown, and credential rotation continue to run inside one provider lane.

In v1, `allowed_providers` must contain only the primary provider and `fallback_providers` must remain empty.

Example:

```bash
MODEL_ROUTING_OVERRIDES='{
"nemotron-3-super": {
"strategy": "single",
"primary": "ollama",
"allowed_providers": ["ollama"],
"fallback_providers": [],
"strict": true,
"allow_global_fallback": false,
"reason": "Only available on Ollama Cloud"
}
}'
```

This rewrites `weighted-router/nemotron-3-super` to `ollama/nemotron-3-super`. Invalid override config fails at startup, and unmatched `weighted-router/*` models fail closed instead of silently falling back to another provider.

#### Request Lifecycle: A Deadline-Driven Approach

The request lifecycle has been designed around a single, authoritative time budget to ensure predictable performance:
Expand Down Expand Up @@ -1925,4 +1949,3 @@ The GUI modifies the same environment variables that the `RotatingClient` reads:
3. **Proxy applies rules** → `get_available_models()` filters based on rules

**Note**: The proxy must be restarted to pick up rule changes made via the GUI (or use the Launcher TUI's reload functionality if available).

26 changes: 26 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -477,6 +477,7 @@ The proxy includes a powerful text-based UI for configuration and management.
| `ROTATION_MODE_<PROVIDER>` | `balanced` or `sequential` | `ROTATION_MODE_GEMINI=sequential` |
| `IGNORE_MODELS_<PROVIDER>` | Blacklist (comma-separated, supports `*`) | `IGNORE_MODELS_OPENAI=*-preview*` |
| `WHITELIST_MODELS_<PROVIDER>` | Whitelist (overrides blacklist) | `WHITELIST_MODELS_GEMINI=gemini-2.5-pro` |
| `MODEL_ROUTING_OVERRIDES` | JSON per-model routing overrides for `weighted-router/*` aliases | `{"nemotron-3-super":{"strategy":"single","primary":"ollama","allowed_providers":["ollama"],"fallback_providers":[],"strict":true,"allow_global_fallback":false}}` |

### Advanced Features

Expand All @@ -491,6 +492,31 @@ The proxy includes a powerful text-based UI for configuration and management.

</details>

<details>
<summary><b>Weighted Router Per-Model Overrides (v1)</b></summary>

Use `MODEL_ROUTING_OVERRIDES` to pin a `weighted-router/<model>` alias to a single provider before credential selection begins. v1 supports only the `single` strategy and fails closed if a matching override is missing or invalid.

In v1, `allowed_providers` must contain only the primary provider and `fallback_providers` must stay empty.

```bash
export MODEL_ROUTING_OVERRIDES='{
"nemotron-3-super": {
"strategy": "single",
"primary": "ollama",
"allowed_providers": ["ollama"],
"fallback_providers": [],
"strict": true,
"allow_global_fallback": false,
"reason": "Only available on Ollama Cloud"
}
}'
```

With that configuration, a request for `weighted-router/nemotron-3-super` is rewritten to `ollama/nemotron-3-super` before the normal retry and credential rotation flow runs.

</details>

<details>
<summary><b>Model Filtering (Whitelists & Blacklists)</b></summary>

Expand Down
Loading
Loading