Skip to content

arcprize/arc-agi-3-benchmarking

Repository files navigation

ARC-AGI-3 Benchmarking

Quickstart

Install uv if not already installed.

  1. Clone the arc-agi-3-benchmarking repo, enter the directory
git clone https://github.com/arcprize/arc-agi-3-benchmarking.git
cd arc-agi-3-benchmarking
  1. Install dependencies
uv venv
uv sync
  1. Copy .env.example to .env
cp .env.example .env
  1. Get an API key from the Arc Prize Website and set it as an environment variable in your .env file.
ARC_API_KEY=your_api_key_here
  1. Run the benchmarking agent against ls20.
uv run main.py --game=ls20

Running the Official Benchmarking Agent

  1. Get a model provider API key

Provider key links:

  1. Set your provider keys as environment variables in your .env file.
ANTHROPIC_API_KEY=your_anthropic_key_here
OPENAI_API_KEY=your_openai_key_here
GOOGLE_API_KEY=your_google_key_here
XAI_API_KEY=your_xai_key_here
GROK_API_KEY=your_grok_key_here
DEEPSEEK_API_KEY=your_deepseek_key_here
GROQ_API_KEY=your_groq_key_here
OPENROUTER_API_KEY=your_openrouter_key_here
FIREWORKS_API_KEY=your_fireworks_key_here
  1. View available games (there should be 25).
uv run main.py --list-games
  1. View available model config.
uv run main.py --list-configs
  1. Run the official benchmarking agent against a game:
uv run main.py --game=ls20 --config=openai-gpt-5-4-2026-03-05

Native Anthropic configs are also available:

uv run main.py --game=ls20 --config=anthropic-opus-4-7-low
uv run main.py --game=ls20 --config=anthropic-opus-4-7-low-thinking
  1. Or on all games:
uv run main.py --config=openai-gpt-5-4-2026-03-05
  1. View your scorecard

When you run a benchmark, a scorecard is saved on the ARC server. If you are logged in, you can browse your saved scorecards at arcprize.org/scorecards.

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors