Skip to content

Routing

Rafael Gumieri edited this page May 8, 2026 · 2 revisions

Routing

Nenya's routing system dynamically selects the optimal upstream provider for each request based on multiple factors including latency, cost, and model capabilities.

Agent-Based Routing

When a client sends a request with model: "agent-name", Nenya resolves the agent and routes through its model list. Agents define:

  • Strategy: "fallback" (try first, then next on failure) or "round-robin" (distribute across targets)
  • Model list: ordered list of models to try
  • Circuit breaker: per target, protects against cascading failures
  • Cooldown: seconds to skip a model after a retryable error

Fallback Chain

Request → Model A → fails → Model B → fails → Model C → succeeds
         ↓
     Circuit breaker trips A, B → tries next

Round-Robin

Request 1 → Model A
Request 2 → Model B
Request 3 → Model C
Request 4 → Model A (wraps)

Balanced Scoring Algorithm

When auto_reorder_by_latency is enabled and routing_strategy is "balanced", targets are scored using a multi-dimensional formula:

score = (latency_normalized * latency_weight)
      - (cost_normalized * cost_weight)
      + capability_boost
      + score_bonus
  • latency_normalized: (maxLat - modelLatency) / (maxLat - minLat) — higher = faster
  • cost_normalized: (modelCost - minCost) / (maxCost - minCost) — lower = cheaper
  • score_bonus: Per-model override
  • capability_boost: +0.1 per matching capability, -0.1 per mismatch

Latency-Aware Reordering

When governance.auto_reorder_by_latency is enabled, targets are sorted by historical median latency (fastest first). A ±5% random jitter is applied to prevent thundering herd — all clients hitting the fastest provider simultaneously.

Configuration

{
  "governance": {
    "auto_reorder_by_latency": true,
    "routing_strategy": "balanced",
    "routing_latency_weight": 1.0,
    "routing_cost_weight": 0.0
  }
}

Latency Tracker

The LatencyTracker maintains per-model sorted sample buffers (incremental binary-search insertion, O(n) per record) and computes median latency. At most 100 samples per model. Stale entries (no updates for 1 hour) are evicted automatically.

Circuit Breaker

Each agent+provider+model combination is tracked independently:

State Behavior
Closed Normal operation. Tracks consecutive failures. Trips to Open after failure_threshold failures.
Open All requests skipped. After cooldown_seconds, transitions to HalfOpen.
HalfOpen Allows up to half_open_max_requests probe requests (configurable via governance.half_open_max_requests, default 3). All succeed → Closed. Any fail → Open.
ForceOpen Immediately opened (used for HTTP 429). Extends cooldown for quota exhaustion.

The circuit breaker is checked twice per target: once during target list construction and again immediately before sending.

Check state via /statsz:

{
  "circuit_breakers": {
    "build:gemini:gemini-3-flash": "closed",
    "build:deepseek:deepseek-reasoner": "open"
  }
}

See Also

Getting Started

Core Concepts

Reference

Operations

  • Demo — Test all pipeline tiers
  • Troubleshooting — Common issues and solutions
  • FAQ — Frequently asked questions
  • Security — Security policy and vulnerability reporting

Project

Clone this wiki locally