ModelSwitchboard

You're not paying for models. You're paying for bad decisions.

Every company says they're "multi-model."

What they usually mean is:

one expensive default model
two cheaper backups nobody trusts
routing logic hidden in application code
no cost discipline
no measurable policy
no way to know what should have happened instead

So spend climbs. Latency drifts. Quality becomes anecdotal. And every meeting ends with the same sentence:

"We should probably test a smarter router."

Probably?

You're already late.

ModelSwitchboard is what disciplined teams install before routing becomes expensive chaos.

It gives you a way to benchmark, train, package, replay, and roll out routing policies with evidence.

Not guesses. Not taste. Not whoever spoke last in the meeting.

You decide routing policy the way serious operators decide anything:

measured offline
validated in shadow mode
deployed gradually
reversible instantly
auditable always

That is how grown systems run.

The real problem nobody says out loud

Most LLM routing stacks fail for one reason:

They optimize model selection after money is already being burned.

They experiment live. They compare dashboards no one trusts. They chase prompt tweaks while architecture waste remains untouched.

That's backwards.

Before you optimize prompts… Before you fine-tune… Before you add another provider…

You should know:

Which requests deserve premium models
Which requests are over-served
Where latency is wasted
Where escalation pays for itself
Where quality gains are fiction

That's what ModelSwitchboard exists to answer.

What it does

Benchmark routing strategies before production traffic pays tuition

Run deterministic evaluations across policies such as:

cheapest acceptable route
quality-max route
learned router
escalation chain
guarded frontier route

See cost, quality, latency, and failure tradeoffs side by side.

Not in theory. In numbers.

Turn routing logic into a governed artifact

Policies ship as explicit assets:

policy.json
feature_schema.json
model weights

Versioned. Portable. Reviewable.

No mystery branch. No "small hotfix." No folklore.

Replay production patterns safely

Use shadow mode to compare what your candidate policy would have done against your current baseline.

No blast radius. No customer-facing roulette. No blind launches dressed up as innovation.

Every decision is a structured record. Auditable. Reviewable. Defensible.

Roll out like an operator, not a gambler

You get controls that matter:

kill switch
canary percentage
tenant pinning
health state gating
compare records against baseline

If something breaks, you don't write a postmortem first. You turn the knob back.

See economics clearly

A smarter route that costs more than it returns is not smart.

Track:

per-request cost
route mix drift
premium model leakage
escalation efficiency
quality gained per dollar spent

That's where executive trust comes from.

Who this is for

Platform leaders. Tired of paying flagship-model prices for average requests.

AI product teams. Need better outcomes without detonating margins.

Infrastructure teams. Need safe rollout mechanics, not notebook demos.

Builders with standards. Who know "send everything to the best model" is laziness wearing confidence.

What changes after adoption

Before:

routing debates
invisible waste
reactive spend control
fragile launches
no accountability

After:

measurable policy
controlled rollout
explainable decisions
budget discipline
faster iteration

That delta compounds.

What it does not pretend to be

No fake promises.

This is not:

magic autonomous intelligence
finished enterprise infrastructure
your IAM layer
your secrets manager
your observability platform
a substitute for engineering judgment

It is something rarer:

a sober, usable control plane for routing decisions that actually matter.

The uncomfortable truth

If you're spending serious money on LLM inference and still routing by instinct, defaults, or patched business logic—

you do not have an AI platform.

You have a billing relationship.

The close

The companies that win this wave won't be the ones with access to the best models.

They'll be the ones who know when not to use them.

ModelSwitchboard gives you that discipline.

Get started

pip install -e ".[dev]"
python -m pytest -m "not live" -q

# Run the offline benchmark.
python -m model_switchboard.main --split test --output-dir data/results/local_test

# Stand up the service.
pip install -e ".[service]"
uvicorn model_switchboard.service.app:build_app --factory --host 0.0.0.0 --port 8080

Python 3.11+. Apache-2.0 licensed. See LICENSE and CONTRIBUTING.md.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
configs		configs
data		data
docs/images		docs/images
scripts		scripts
src/model_switchboard		src/model_switchboard
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ModelSwitchboard

ModelSwitchboard is what disciplined teams install before routing becomes expensive chaos.

The real problem nobody says out loud

What it does

Benchmark routing strategies before production traffic pays tuition