ARC-AGI-3 Benchmarking

Quickstart

Install uv if not already installed.

Clone the arc-agi-3-benchmarking repo, enter the directory

git clone https://github.com/arcprize/arc-agi-3-benchmarking.git
cd arc-agi-3-benchmarking

Install dependencies

uv venv
uv sync

Copy .env.example to .env

cp .env.example .env

Get an API key from the Arc Prize Website and set it as an environment variable in your .env file.

ARC_API_KEY=your_api_key_here

Run the benchmarking agent against ls20.

uv run main.py --game=ls20

Running the Official Benchmarking Agent

Get a model provider API key

Provider key links:

Set your provider keys as environment variables in your .env file.

ANTHROPIC_API_KEY=your_anthropic_key_here
OPENAI_API_KEY=your_openai_key_here
GOOGLE_API_KEY=your_google_key_here
XAI_API_KEY=your_xai_key_here
GROK_API_KEY=your_grok_key_here
DEEPSEEK_API_KEY=your_deepseek_key_here
GROQ_API_KEY=your_groq_key_here
OPENROUTER_API_KEY=your_openrouter_key_here
FIREWORKS_API_KEY=your_fireworks_key_here

View available games (there should be 25).

uv run main.py --list-games

View available model config.

uv run main.py --list-configs

Run the official benchmarking agent against a game:

uv run main.py --game=ls20 --config=openai-gpt-5-4-2026-03-05

Native Anthropic configs are also available:

uv run main.py --game=ls20 --config=anthropic-opus-4-7-low
uv run main.py --game=ls20 --config=anthropic-opus-4-7-low-thinking

Or on all games:

uv run main.py --config=openai-gpt-5-4-2026-03-05

View your scorecard

When you run a benchmark, a scorecard is saved on the ARC server. If you are logged in, you can browse your saved scorecards at arcprize.org/scorecards.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
benchmarking		benchmarking
docs		docs
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ARC-AGI-3 Benchmarking

Quickstart

Running the Official Benchmarking Agent

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ARC-AGI-3 Benchmarking

Quickstart

Running the Official Benchmarking Agent

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages