Skip to content

Leaderboard: update vLLM-SR to v350 metrics (#131), now #1#133

Merged
yl231 merged 1 commit into
mainfrom
update-vllm-sr-v350-leaderboard
Jun 4, 2026
Merged

Leaderboard: update vLLM-SR to v350 metrics (#131), now #1#133
yl231 merged 1 commit into
mainfrom
update-vllm-sr-v350-leaderboard

Conversation

@yl231
Copy link
Copy Markdown
Contributor

@yl231 yl231 commented Jun 4, 2026

vLLM Semantic Router resubmission (#131) re-evaluated at RouterArena score 0.7538 (was 0.6723). Updated its row and re-sorted ranks 1-9:

Arena 67.23 -> 75.38
Accuracy 66.53 -> 75.97
Cost/1K $0.06 -> $0.11
Opt.Sel 84.66 -> 20.12
Opt.Cost 90.71 -> 24.52
Opt.Acc 89.24 -> 89.87
Robust 90.95 -> 73.10

At 75.38 vLLM-SR overtakes Sqwish (75.27) for # 1; Sqwish, AgentForge, Nadir, Weave, OrcaRouter-Adaptive, Azure, R2-Router and Auto each shift down one rank. Ranks 10-20 unchanged. Metrics taken from the final /evaluate run on the merged submission (verified byte-identical to main).

vLLM Semantic Router resubmission (#131) re-evaluated at RouterArena
score 0.7538 (was 0.6723). Updated its row and re-sorted ranks 1-9:

  Arena    67.23 -> 75.38
  Accuracy 66.53 -> 75.97
  Cost/1K  $0.06 -> $0.11
  Opt.Sel  84.66 -> 20.12
  Opt.Cost 90.71 -> 24.52
  Opt.Acc  89.24 -> 89.87
  Robust   90.95 -> 73.10

At 75.38 vLLM-SR overtakes Sqwish (75.27) for #1; Sqwish, AgentForge,
Nadir, Weave, OrcaRouter-Adaptive, Azure, R2-Router and Auto each shift
down one rank. Ranks 10-20 unchanged. Metrics taken from the final
/evaluate run on the merged submission (verified byte-identical to main).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@yl231 yl231 merged commit 1264487 into main Jun 4, 2026
10 checks passed
@yl231 yl231 deleted the update-vllm-sr-v350-leaderboard branch June 4, 2026 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant