Routing

Nenya's routing system dynamically selects the optimal upstream provider for each request based on multiple factors including latency, cost, and model capabilities.

Agent-Based Routing

When a client sends a request with model: "agent-name", Nenya resolves the agent and routes through its model list. Agents define:

Strategy: "fallback" (try first, then next on failure) or "round-robin" (distribute across targets)
Model list: ordered list of models to try
Circuit breaker: per target, protects against cascading failures
Cooldown: seconds to skip a model after a retryable error

Fallback Chain

Request → Model A → fails → Model B → fails → Model C → succeeds
         ↓
     Circuit breaker trips A, B → tries next

Round-Robin

Request 1 → Model A
Request 2 → Model B
Request 3 → Model C
Request 4 → Model A (wraps)

Balanced Scoring Algorithm

When auto_reorder_by_latency is enabled and routing_strategy is "balanced", targets are scored using a multi-dimensional formula:

score = (latency_normalized * latency_weight)
      - (cost_normalized * cost_weight)
      + capability_boost
      + score_bonus

latency_normalized: (maxLat - modelLatency) / (maxLat - minLat) — higher = faster
cost_normalized: (modelCost - minCost) / (maxCost - minCost) — lower = cheaper
score_bonus: Per-model override
capability_boost: +0.1 per matching capability, -0.1 per mismatch

Latency-Aware Reordering

When governance.auto_reorder_by_latency is enabled, targets are sorted by historical median latency (fastest first). A ±5% random jitter is applied to prevent thundering herd — all clients hitting the fastest provider simultaneously.

Configuration

{
  "governance": {
    "auto_reorder_by_latency": true,
    "routing_strategy": "balanced",
    "routing_latency_weight": 1.0,
    "routing_cost_weight": 0.0
  }
}

Latency Tracker

The LatencyTracker maintains per-model sorted sample buffers (incremental binary-search insertion, O(n) per record) and computes median latency. At most 100 samples per model. Stale entries (no updates for 1 hour) are evicted automatically.

Circuit Breaker

Each agent+provider+model combination is tracked independently:

State	Behavior
Closed	Normal operation. Tracks consecutive failures. Trips to Open after `failure_threshold` failures.
Open	All requests skipped. After `cooldown_seconds`, transitions to HalfOpen.
HalfOpen	Allows up to `half_open_max_requests` probe requests (configurable via `governance.half_open_max_requests`, default 3). All succeed → Closed. Any fail → Open.
ForceOpen	Immediately opened (used for HTTP 429). Extends cooldown for quota exhaustion.

The circuit breaker is checked twice per target: once during target list construction and again immediately before sending.

Check state via /statsz:

{
  "circuit_breakers": {
    "build:gemini:gemini-3-flash": "closed",
    "build:deepseek:deepseek-reasoner": "open"
  }
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Routing

Routing

Agent-Based Routing

Fallback Chain

Round-Robin

Balanced Scoring Algorithm

Latency-Aware Reordering

Configuration

Latency Tracker

Circuit Breaker

See Also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally