GitHub - qcy615/smart-router

Smart Router

A high-performance, production-grade request router for LLM inference serving. Supports Prefill-Decode (PD) disaggregation, prefix-aware KV-cache routing, native Kubernetes (k8s) service discovery, and integration with multiple inference backends (vLLM, SGLang).

Key Features

Core Architecture: Request routing framework and async processing patterns
Load Balancing: Multiple algorithms (prefix aware, power of two, consistent hashing, minimum load, round robin)
Prefill-Decode Disaggregation: Specialized routing for separated processing phases
Service Discovery: Kubernetes-native worker management and health monitoring
Multi-Backend Support — vLLM and SGLang inference engines
Data-Parallel Awareness — Support for intra-node data-parallel worker groups
Built-in Benchmark — Multi-turn benchmarking tool for evaluating routing performance

Installation

#Install from source
pip install .

#Install with benchmark dependencies
pip install .[benchmark]

Or use uv:

uv sync
uv sync --extra benchmark  # with benchmark dependencies

Docker

docker build -t smart-router .

# With benchmark extras
docker build --build-arg INSTALL_BENCHMARK=true -t smart-router .

Quick Start

Regular HTTP Routing

python -m smart_router serve \
	--router-type vllm \
	--policy power_of_two \
	--worker-urls http://worker1:8000 http://worker2:8000 \
	--worker-intra-dp-size 4

Prefill/Decode Disaggregation (PD)

python -m smart_router serve \
    --router-type vllm \
    --pd-disaggregation \
    --prefill-urls http://worker1:8000 \
    --decode-urls http://worker2:8000 \
    --prefill-policy power_of_two \
    --decode-policy power_of_two \
    --prefill-intra-dp-size 2 \
    --decode-intra-dp-size 2

Documentation

benchmark

Run the integrated benchmark entrypoint:

python -m smart_router benchmark --input-file conversations.json --model /path/to/model --url http://127.0.0.1:8000

RoadMap

SGLang support
Service discovery
vllm kv event report
batch schedule
prompt bin packing policy

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
docs		docs
imgs		imgs
smart_router		smart_router
tests		tests
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Documentation

benchmark

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Documentation

benchmark

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages