Skip to content

feat(llm): complexity-based triage routing for multi-provider pool#2153

Merged
bug-ops merged 5 commits intomainfrom
feat-issue-2141-complexity-triage-routing
Mar 23, 2026
Merged

feat(llm): complexity-based triage routing for multi-provider pool#2153
bug-ops merged 5 commits intomainfrom
feat-issue-2141-complexity-triage-routing

Conversation

@bug-ops
Copy link
Owner

@bug-ops bug-ops commented Mar 23, 2026

Summary

Implements complexity-based pre-inference triage routing for the multi-provider LLM pool (closes #2141).

Before each LLM call, a configurable classifier provider evaluates input complexity and routes to a tier-specific provider:

  • Simple → fast/cheap model (e.g. Haiku, qwen3:1.7b)
  • Medium → default model
  • Complex → smart model (e.g. Sonnet, GPT-4o)
  • Expert → expert model (e.g. Opus, o1)

Changes

  • crates/zeph-llm/src/router/triage.rs (new) — ComplexityTier, TriageRouter, TriageMetrics
  • crates/zeph-llm/src/any.rsAnyProvider::Triage variant + all match sites
  • crates/zeph-config/src/providers.rsLlmRoutingStrategy::Triage, ComplexityRoutingConfig, TierMapping
  • crates/zeph-core/src/bootstrap/provider.rsbuild_triage_provider()
  • crates/zeph-core/src/agent/tool_execution/native.rs — status indicator
  • src/init.rs--init wizard step

Config

[llm.complexity_routing]
enabled = true
triage_provider = "fast"
bypass_single_provider = true
triage_timeout_secs = 5

[llm.complexity_routing.tiers]
simple = "fast"
medium = "default"
complex = "smart"
expert = "expert"

Key design decisions

  • TriageRouter implements LlmProvider directly via Box::pin to break the recursive AnyProvider type cycle
  • Classifier call wrapped in tokio::time::timeout; regex fallback on malformed JSON output
  • Context-window auto-escalation skips gracefully when provider returns None
  • last_provider_idx (AtomicUsize) tracks last-used tier for correct cost/token reporting
  • set_status_tx() propagates to all tier providers — streaming indicators work during inference
  • bypass_single_provider compares config entry names (not runtime type names) for heterogeneous pools
  • All metrics use AtomicU64 — no Mutex in async paths

Tests

18 new unit tests: tier routing, timeout fallback, context-window escalation (incl. None case), metrics counters, config TOML round-trip, LlmRoutingStrategy::Triage deserialization.

Total: 6443 passed (was 6432, +11 net after deduplication).

Deferred (follow-up issues)

  • TUI metrics panel wiring (metrics available via TriageRouter::metrics())
  • Interactive --init wizard step (config schema added, literal updated)
  • Prompt injection hardening (system message prefix in triage call)
  • chat_stream/chat_with_tools async delegation tests

…2141)

Add LlmRoutingStrategy::Triage and TriageRouter to zeph-llm. Before each
LLM call, a configurable classifier provider evaluates input complexity and
routes to a tier-specific provider (Simple/Medium/Complex/Expert).

Key design:
- TriageRouter implements LlmProvider directly via Box::pin to break the
  recursive AnyProvider type cycle; bootstrap wires it as the top-level
  provider when routing = "triage"
- Classifier call is wrapped in tokio::time::timeout (default 5s) with
  fallback to the default tier on timeout or JSON parse failure
- Regex extraction fallback when structured JSON output is malformed
- Context-window auto-escalation skips gracefully when provider returns None
- last_provider_idx (AtomicUsize) tracks the last-used tier; last_usage()
  and last_cache_usage() delegate to that provider for correct cost tracking
- set_status_tx() propagates to all tier providers so streaming indicators
  work during actual inference
- bypass_single_provider compares config entry names, not runtime provider
  type names, to handle heterogeneous pools correctly
- AtomicU64 metrics: per-tier call counts, latency_us_total, escalations,
  fallbacks, timeout_fallbacks
- AnyProvider::Triage variant added; all match sites updated

Config: [llm.complexity_routing] with triage_provider, tiers.{simple,medium,
complex,expert}, triage_timeout_secs, max_triage_tokens, bypass_single_provider,
fallback_strategy; --init wizard step; --migrate-config support

Agent loop emits "Evaluating complexity..." status indicator during triage.

18 new unit tests covering tier routing, timeout fallback, context-window
escalation (including None case), metrics counters, config TOML round-trip,
and LlmRoutingStrategy::Triage deserialization.

Closes #2141
@github-actions github-actions bot added documentation Improvements or additions to documentation llm zeph-llm crate (Ollama, Claude) rust Rust code changes core zeph-core crate enhancement New feature or request size/XL Extra large PR (500+ lines) labels Mar 23, 2026
@bug-ops bug-ops enabled auto-merge (squash) March 23, 2026 05:58
bug-ops added 4 commits March 23, 2026 07:08
- TriageRouter::set_status_tx takes &StatusTx (needless_pass_by_value)
- remove unused LlmProvider import in bootstrap/provider.rs
- collapse nested if blocks in bypass_single_provider check
- book/src/advanced/complexity-triage.md (new) — full feature page:
  config reference, bypass optimization, timeout/fallback, cascade hybrid
- book/src/advanced/adaptive-inference.md — add triage row to strategy table
- book/src/reference/configuration.md — document [llm.complexity_routing]
  fields and add cross-reference link
- book/src/SUMMARY.md — insert new page under Advanced
- README.md — mention LlmRoutingStrategy::Triage in routing description
- crates/zeph-llm/README.md — add TriageRouter section with config example
- crates/zeph-config/README.md — add ComplexityRoutingConfig to key types
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment