Skip to content

add Jina to WebVoyager leaderboard at rank 1 (98.9%)#22

Open
keon wants to merge 2 commits intosteel-dev:mainfrom
omxyz:main
Open

add Jina to WebVoyager leaderboard at rank 1 (98.9%)#22
keon wants to merge 2 commits intosteel-dev:mainfrom
omxyz:main

Conversation

@keon
Copy link
Copy Markdown

@keon keon commented Apr 23, 2026

Summary

Adds Jina (Om Labs) to the WebVoyager leaderboard at rank 1 with 98.9% (603/610).

Stack:

  • Browser agent: Jina MCP (openai/gpt-5.4-nano)
  • Planner: Claude Code (claude-opus-4-7)
  • Driver: Playwright
  • Judge: gpt-5 with vision (up to 15 screenshots per task)

Evidence

Full per-task results — transcripts, final answers, screenshots, and the gpt-5 judge rationale for each of the 610 tasks — are at https://webvoyager.omlabs.xyz.

Changes

  • src/lib/leaderboard.ts — insert Jina at top of `leaderboardEntries`
  • src/lib/benchmarks.ts — update WebVoyager `topAgent` / `topScore`
  • README.md — regenerated via `pnpm update-readme`
  • faqs

keon added 2 commits April 23, 2026 08:56
Jina by Om Labs — Claude Code + Opus 4.7 + GPT 5.4 Nano — achieves
98.9% (603/610) on the 610-task WebVoyager curation, judged by gpt-5
vision against up to 15 screenshots per task.

Writeup + per-task drilldown: https://webvoyager.omlabs.xyz
feat: add Jina to WebVoyager leaderboard at rank 1 (98.9%)
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 23, 2026

@keon is attempting to deploy a commit to the Steel Team on Vercel.

A member of the Team first needs to authorize it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant