feat(llm): complexity-based triage routing for multi-provider pool#2153
Merged
feat(llm): complexity-based triage routing for multi-provider pool#2153
Conversation
…2141) Add LlmRoutingStrategy::Triage and TriageRouter to zeph-llm. Before each LLM call, a configurable classifier provider evaluates input complexity and routes to a tier-specific provider (Simple/Medium/Complex/Expert). Key design: - TriageRouter implements LlmProvider directly via Box::pin to break the recursive AnyProvider type cycle; bootstrap wires it as the top-level provider when routing = "triage" - Classifier call is wrapped in tokio::time::timeout (default 5s) with fallback to the default tier on timeout or JSON parse failure - Regex extraction fallback when structured JSON output is malformed - Context-window auto-escalation skips gracefully when provider returns None - last_provider_idx (AtomicUsize) tracks the last-used tier; last_usage() and last_cache_usage() delegate to that provider for correct cost tracking - set_status_tx() propagates to all tier providers so streaming indicators work during actual inference - bypass_single_provider compares config entry names, not runtime provider type names, to handle heterogeneous pools correctly - AtomicU64 metrics: per-tier call counts, latency_us_total, escalations, fallbacks, timeout_fallbacks - AnyProvider::Triage variant added; all match sites updated Config: [llm.complexity_routing] with triage_provider, tiers.{simple,medium, complex,expert}, triage_timeout_secs, max_triage_tokens, bypass_single_provider, fallback_strategy; --init wizard step; --migrate-config support Agent loop emits "Evaluating complexity..." status indicator during triage. 18 new unit tests covering tier routing, timeout fallback, context-window escalation (including None case), metrics counters, config TOML round-trip, and LlmRoutingStrategy::Triage deserialization. Closes #2141
- TriageRouter::set_status_tx takes &StatusTx (needless_pass_by_value) - remove unused LlmProvider import in bootstrap/provider.rs - collapse nested if blocks in bypass_single_provider check
- book/src/advanced/complexity-triage.md (new) — full feature page: config reference, bypass optimization, timeout/fallback, cascade hybrid - book/src/advanced/adaptive-inference.md — add triage row to strategy table - book/src/reference/configuration.md — document [llm.complexity_routing] fields and add cross-reference link - book/src/SUMMARY.md — insert new page under Advanced - README.md — mention LlmRoutingStrategy::Triage in routing description - crates/zeph-llm/README.md — add TriageRouter section with config example - crates/zeph-config/README.md — add ComplexityRoutingConfig to key types
This was
linked to
issues
Mar 23, 2026
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements complexity-based pre-inference triage routing for the multi-provider LLM pool (closes #2141).
Before each LLM call, a configurable classifier provider evaluates input complexity and routes to a tier-specific provider:
Simple→ fast/cheap model (e.g. Haiku, qwen3:1.7b)Medium→ default modelComplex→ smart model (e.g. Sonnet, GPT-4o)Expert→ expert model (e.g. Opus, o1)Changes
crates/zeph-llm/src/router/triage.rs(new) —ComplexityTier,TriageRouter,TriageMetricscrates/zeph-llm/src/any.rs—AnyProvider::Triagevariant + all match sitescrates/zeph-config/src/providers.rs—LlmRoutingStrategy::Triage,ComplexityRoutingConfig,TierMappingcrates/zeph-core/src/bootstrap/provider.rs—build_triage_provider()crates/zeph-core/src/agent/tool_execution/native.rs— status indicatorsrc/init.rs—--initwizard stepConfig
Key design decisions
TriageRouterimplementsLlmProviderdirectly viaBox::pinto break the recursiveAnyProvidertype cycletokio::time::timeout; regex fallback on malformed JSON outputNonelast_provider_idx(AtomicUsize) tracks last-used tier for correct cost/token reportingset_status_tx()propagates to all tier providers — streaming indicators work during inferencebypass_single_providercompares config entry names (not runtime type names) for heterogeneous poolsAtomicU64— noMutexin async pathsTests
18 new unit tests: tier routing, timeout fallback, context-window escalation (incl.
Nonecase), metrics counters, config TOML round-trip,LlmRoutingStrategy::Triagedeserialization.Total: 6443 passed (was 6432, +11 net after deduplication).
Deferred (follow-up issues)
TriageRouter::metrics())--initwizard step (config schema added, literal updated)chat_stream/chat_with_toolsasync delegation tests