Skip to content

karthikarunapuram8-dot/model-switchboard

Repository files navigation

ModelSwitchboard

You're not paying for models. You're paying for bad decisions.

Every company says they're "multi-model."

What they usually mean is:

  • one expensive default model
  • two cheaper backups nobody trusts
  • routing logic hidden in application code
  • no cost discipline
  • no measurable policy
  • no way to know what should have happened instead

So spend climbs. Latency drifts. Quality becomes anecdotal. And every meeting ends with the same sentence:

"We should probably test a smarter router."

Probably?

You're already late.

ModelSwitchboard is what disciplined teams install before routing becomes expensive chaos.

It gives you a way to benchmark, train, package, replay, and roll out routing policies with evidence.

Not guesses. Not taste. Not whoever spoke last in the meeting.

You decide routing policy the way serious operators decide anything:

  • measured offline
  • validated in shadow mode
  • deployed gradually
  • reversible instantly
  • auditable always

That is how grown systems run.

The real problem nobody says out loud

Most LLM routing stacks fail for one reason:

They optimize model selection after money is already being burned.

They experiment live. They compare dashboards no one trusts. They chase prompt tweaks while architecture waste remains untouched.

That's backwards.

Before you optimize prompts… Before you fine-tune… Before you add another provider…

You should know:

  • Which requests deserve premium models
  • Which requests are over-served
  • Where latency is wasted
  • Where escalation pays for itself
  • Where quality gains are fiction

That's what ModelSwitchboard exists to answer.

What it does

Benchmark routing strategies before production traffic pays tuition

Run deterministic evaluations across policies such as:

  • cheapest acceptable route
  • quality-max route
  • learned router
  • escalation chain
  • guarded frontier route

See cost, quality, latency, and failure tradeoffs side by side.

Not in theory. In numbers.

frontier report

Turn routing logic into a governed artifact

Policies ship as explicit assets:

  • policy.json
  • feature_schema.json
  • model weights

Versioned. Portable. Reviewable.

No mystery branch. No "small hotfix." No folklore.

policy artifact

Replay production patterns safely

Use shadow mode to compare what your candidate policy would have done against your current baseline.

No blast radius. No customer-facing roulette. No blind launches dressed up as innovation.

shadow replay

Every decision is a structured record. Auditable. Reviewable. Defensible.

decision record

Roll out like an operator, not a gambler

You get controls that matter:

  • kill switch
  • canary percentage
  • tenant pinning
  • health state gating
  • compare records against baseline

If something breaks, you don't write a postmortem first. You turn the knob back.

See economics clearly

A smarter route that costs more than it returns is not smart.

Track:

  • per-request cost
  • route mix drift
  • premium model leakage
  • escalation efficiency
  • quality gained per dollar spent

That's where executive trust comes from.

compare runs

Who this is for

Platform leaders. Tired of paying flagship-model prices for average requests.

AI product teams. Need better outcomes without detonating margins.

Infrastructure teams. Need safe rollout mechanics, not notebook demos.

Builders with standards. Who know "send everything to the best model" is laziness wearing confidence.

What changes after adoption

Before:

  • routing debates
  • invisible waste
  • reactive spend control
  • fragile launches
  • no accountability

After:

  • measurable policy
  • controlled rollout
  • explainable decisions
  • budget discipline
  • faster iteration

That delta compounds.

What it does not pretend to be

No fake promises.

This is not:

  • magic autonomous intelligence
  • finished enterprise infrastructure
  • your IAM layer
  • your secrets manager
  • your observability platform
  • a substitute for engineering judgment

It is something rarer:

a sober, usable control plane for routing decisions that actually matter.

The uncomfortable truth

If you're spending serious money on LLM inference and still routing by instinct, defaults, or patched business logic—

you do not have an AI platform.

You have a billing relationship.

The close

The companies that win this wave won't be the ones with access to the best models.

They'll be the ones who know when not to use them.

ModelSwitchboard gives you that discipline.


Get started

pip install -e ".[dev]"
python -m pytest -m "not live" -q

# Run the offline benchmark.
python -m model_switchboard.main --split test --output-dir data/results/local_test

# Stand up the service.
pip install -e ".[service]"
uvicorn model_switchboard.service.app:build_app --factory --host 0.0.0.0 --port 8080

Python 3.11+. Apache-2.0 licensed. See LICENSE and CONTRIBUTING.md.

About

Offline benchmark and shadow-mode control plane for LLM routing policies

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors