Problem
Issues #71–#77 each solve a piece of the refactor, but no single issue owns the integrated end-to-end user experience. Each issue could be "done" in isolation without the full flow actually working.
This meta-issue defines the concrete acceptance gate: a brand-new developer can go from git clone to watching their agent play a scenario in under 10 minutes, with zero Godot knowledge.
The Target Experience
# Step 1: Clone and enter a starter (2 min)
git clone https://github.com/JustInternetAI/AgentArena.git
cd AgentArena/starters/beginner
# Step 2: Install dependencies (2 min)
pip install -r requirements.txt
# Step 3: Run (10 seconds)
python run.py --scenario foraging
# What happens:
# - Python agent server starts on port 5000
# - Game window launches automatically (compiled executable)
# - Foraging scenario loads
# - Agent starts making decisions and moving
# - User watches their agent in the game window
# - Terminal shows decision log
No Godot editor. No manual scene selection. No pressing SPACE. No configuring ports.
Smoke Test (Integration Gate)
Before any release, this test must pass:
A LangGraph agent plays 3 episodes of foraging, scoring >50 on episode 3, launched with a single python run.py --scenario foraging --episodes 3 command, with zero manual intervention.
This forces the following to work together:
Checklist (Cross-Issue Integration)
Installation
Single Command Launch
Visible Feedback
Error Handling
Documentation
Issues That Contribute
| Issue |
What it provides |
| #77 |
Compiled executable, auto-launch, scenario selection |
| #71 |
Tool completion callbacks (agent sees results) |
| #72 |
Mock testing (develop without Godot) |
| #73 |
Complete intermediate starter |
| #74 |
Framework adapters (LangGraph, Claude SDK) |
| #75 |
Game-side inspector |
| #76 |
Persistent cross-episode memory |
| SDK consolidation |
Single API surface |
| SDK packaging |
pip install works |
| Episode lifecycle |
Auto-restart between episodes |
This Issue Is "Done" When
A fresh machine with Python 3.11 and no prior Agent Arena setup can complete the target experience above. Tested on both Windows and Ubuntu.
Estimated Effort
Not a separate work item — this is the integration test that validates all other issues are truly complete. ~Half day to write the automated smoke test, verify on clean machine, and update READMEs.
Problem
Issues #71–#77 each solve a piece of the refactor, but no single issue owns the integrated end-to-end user experience. Each issue could be "done" in isolation without the full flow actually working.
This meta-issue defines the concrete acceptance gate: a brand-new developer can go from
git cloneto watching their agent play a scenario in under 10 minutes, with zero Godot knowledge.The Target Experience
No Godot editor. No manual scene selection. No pressing SPACE. No configuring ports.
Smoke Test (Integration Gate)
Before any release, this test must pass:
This forces the following to work together:
Checklist (Cross-Issue Integration)
Installation
pip install -r requirements.txtinstalls SDK + all dependenciesSingle Command Launch
python run.py --scenario foragingstarts everythingVisible Feedback
Error Handling
Documentation
Issues That Contribute
This Issue Is "Done" When
A fresh machine with Python 3.11 and no prior Agent Arena setup can complete the target experience above. Tested on both Windows and Ubuntu.
Estimated Effort
Not a separate work item — this is the integration test that validates all other issues are truly complete. ~Half day to write the automated smoke test, verify on clean machine, and update READMEs.