Leaderboard: update vLLM-SR to v350 metrics (#131), now #1 by yl231 · Pull Request #133 · RouteWorks/RouterArena

yl231 · 2026-06-04T18:21:00Z

vLLM Semantic Router resubmission (#131) re-evaluated at RouterArena score 0.7538 (was 0.6723). Updated its row and re-sorted ranks 1-9:

Arena 67.23 -> 75.38
Accuracy 66.53 -> 75.97
Cost/1K $0.06 -> $0.11
Opt.Sel 84.66 -> 20.12
Opt.Cost 90.71 -> 24.52
Opt.Acc 89.24 -> 89.87
Robust 90.95 -> 73.10

At 75.38 vLLM-SR overtakes Sqwish (75.27) for # 1; Sqwish, AgentForge, Nadir, Weave, OrcaRouter-Adaptive, Azure, R2-Router and Auto each shift down one rank. Ranks 10-20 unchanged. Metrics taken from the final /evaluate run on the merged submission (verified byte-identical to main).

vLLM Semantic Router resubmission (#131) re-evaluated at RouterArena score 0.7538 (was 0.6723). Updated its row and re-sorted ranks 1-9: Arena 67.23 -> 75.38 Accuracy 66.53 -> 75.97 Cost/1K $0.06 -> $0.11 Opt.Sel 84.66 -> 20.12 Opt.Cost 90.71 -> 24.52 Opt.Acc 89.24 -> 89.87 Robust 90.95 -> 73.10 At 75.38 vLLM-SR overtakes Sqwish (75.27) for #1; Sqwish, AgentForge, Nadir, Weave, OrcaRouter-Adaptive, Azure, R2-Router and Auto each shift down one rank. Ranks 10-20 unchanged. Metrics taken from the final /evaluate run on the merged submission (verified byte-identical to main). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

yl231 merged commit 1264487 into main Jun 4, 2026
10 checks passed

yl231 deleted the update-vllm-sr-v350-leaderboard branch June 4, 2026 18:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leaderboard: update vLLM-SR to v350 metrics (#131), now #1#133

Leaderboard: update vLLM-SR to v350 metrics (#131), now #1#133
yl231 merged 1 commit into
mainfrom
update-vllm-sr-v350-leaderboard

yl231 commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yl231 commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant