screenshotbench

An open benchmark for how frontier coding models implement React UI from a single reference screenshot. Live at screenshotbench.com.

Each cell in the matrix is one model's attempt at one reference. Generated components render live in a sandboxed iframe; an LLM judge scores visual fidelity from 0–100 across four dimensions (layout, palette, polish, completeness).

What's in v0

6 models: Claude Opus 4.7, Claude Sonnet 4.6, Gemini 3.1 Pro, Gemini 3 Flash, GPT-5.5, Composer 2
3 references: PostHog homepage, PromptHub pricing, Mistral signup
18 graded cells

How it works

references (screenshots)
        +
      models  ────►  Cursor SDK (local mode)  ────►  Component.tsx
                                                          │
                                                          ▼
                                                Playwright render @ 1280×853
                                                          │
                                  ┌───────────────────────┴─────────────────────┐
                                  ▼                                             ▼
                          PNG thumbnail (Convex storage)              vision judge (GPT-5.5)
                                                                              │
                                                                              ▼
                                                                       score + reasoning

Each generation runs through the Cursor SDK in local mode against a fixed prompt. The result lands in Convex. A second runner takes a desktop-width screenshot of the rendered output and uploads it to storage so the matrix view serves static images instead of live iframes (mobile Safari OOM-killed pages with 18 concurrent bundler instances). A third runner sends the reference + rendered screenshots to the LLM judge and writes the score back.

The detail modal (tap any tile) mounts a single live Sandpack so you can drag the splitter or hit Desktop/Tablet/Mobile to see how the component actually responds at each breakpoint.

Stack

Frontend — React + Vite, deployed on Vercel
Backend — Convex (efficient-anteater-70) for references, models, runs, judge results, and file storage
Generation — @cursor/sdk in local mode (each cell spawns a cursor-agent against a temp cwd)
Render + judge — Playwright (Chromium) + OpenAI Chat Completions vision
Live preview — @codesandbox/sandpack-react (modal only)

Repo layout

convex/         # schema, queries, mutations
src/            # React app (matrix view + detail modal)
runner/
  add-reference.ts        # add a screenshot to references table
  trigger-one.ts          # queue a single (ref, model) cell
  trigger-batch.ts        # queue all 6 models for a list of refs
  run-pending.ts          # external runner — drains queued runs via Cursor SDK
  render-previews.ts      # render generated components to PNG → Convex storage
  judge/judge-run.ts      # vision judge → score + per-dim reasoning
  rubric/                 # structural rubric pipeline (parked; not in current UI)
public/         # mascot, favicon, og.png

Local dev

npm install
npx convex dev          # starts dev deployment, generates convex/_generated
npm run dev             # vite at http://localhost:5173

Adding a new reference

cd runner
npx tsx add-reference.ts <slug> "<Name>" <category> /path/to/screenshot.png

# queue all 6 models for the new reference
npx tsx trigger-batch.ts   # edit targetSlugs first

# generate code with Cursor SDK
CURSOR_API_KEY=$(security find-generic-password -s CURSOR_API_KEY -w) \
  npx tsx run-pending.ts

# render thumbnails + judge
CONVEX_URL=https://efficient-anteater-70.convex.cloud \
  npx tsx render-previews.ts --reference <slug>

OPENAI_API_KEY=$(security find-generic-password -s OPENAI_API_KEY -w) \
  CONVEX_URL=https://efficient-anteater-70.convex.cloud \
  npx tsx judge/judge-run.ts --reference <slug>

Deploy

npm run build runs convex deploy --cmd "vite build" — it pushes the backend to the prod Convex deployment and then builds the static frontend. Vercel runs this automatically on push to main (CONVEX_DEPLOY_KEY set as a project env var).

For one-off local deploys: vercel --prod.

License

MIT. Built by Dan Cleary.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.agents/skills		.agents/skills
.claude/skills		.claude/skills
convex		convex
public		public
runner		runner
spike		spike
src		src
.env.production		.env.production
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
README.md		README.md
eslint.config.js		eslint.config.js
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
skills-lock.json		skills-lock.json
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

screenshotbench

What's in v0

How it works

Stack

Repo layout

Local dev

Adding a new reference

Deploy

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

screenshotbench

What's in v0

How it works

Stack

Repo layout

Local dev

Adding a new reference

Deploy

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages