Install uv if not already installed.
- Clone the arc-agi-3-benchmarking repo, enter the directory
git clone https://github.com/arcprize/arc-agi-3-benchmarking.git
cd arc-agi-3-benchmarking- Install dependencies
uv venv
uv sync- Copy .env.example to .env
cp .env.example .env- Get an API key from the Arc Prize Website and set it as an environment variable in your .env file.
ARC_API_KEY=your_api_key_here- Run the benchmarking agent against ls20.
uv run main.py --game=ls20- Get a model provider API key
Provider key links:
- Set your provider keys as environment variables in your .env file.
ANTHROPIC_API_KEY=your_anthropic_key_here
OPENAI_API_KEY=your_openai_key_here
GOOGLE_API_KEY=your_google_key_here
XAI_API_KEY=your_xai_key_here
GROK_API_KEY=your_grok_key_here
DEEPSEEK_API_KEY=your_deepseek_key_here
GROQ_API_KEY=your_groq_key_here
OPENROUTER_API_KEY=your_openrouter_key_here
FIREWORKS_API_KEY=your_fireworks_key_here- View available games (there should be 25).
uv run main.py --list-games- View available model config.
uv run main.py --list-configs- Run the official benchmarking agent against a game:
uv run main.py --game=ls20 --config=openai-gpt-5-4-2026-03-05Native Anthropic configs are also available:
uv run main.py --game=ls20 --config=anthropic-opus-4-7-low
uv run main.py --game=ls20 --config=anthropic-opus-4-7-low-thinking- Or on all games:
uv run main.py --config=openai-gpt-5-4-2026-03-05- View your scorecard
When you run a benchmark, a scorecard is saved on the ARC server. If you are logged in, you can browse your saved scorecards at arcprize.org/scorecards.
This project is licensed under the MIT License. See the LICENSE file for details.