Problem
With the compiled executable (#77) and cross-episode learning (#76), several episode lifecycle questions are unanswered. Currently:
- SPACE starts/stops simulation (goes away with auto-start)
- R resets the scene
- Episode boundaries are detected ad-hoc by tick resets in
EpisodeMemoryManager
- No formal protocol for episode end conditions
- No way to chain episodes automatically
For persistent cross-episode memory (#76) to work, and for the compiled executable (#77) to feel polished, the episode lifecycle needs a clear protocol that both Godot and Python agree on.
Questions to Answer
1. How does an episode END?
Define explicit end conditions per scenario:
- Foraging: All resources collected? Health reaches 0? Tick limit (e.g., 500 ticks)?
- Crafting chain: All recipes crafted? Tick limit?
- Team capture: All points held? Time limit?
Currently foraging.gd has MAX_RESOURCES=7 but no end-on-completion logic.
2. How does Godot signal episode end to Python?
Options:
- Special field in observation:
{"episode_complete": true, "reason": "all_collected", "score": 85}
- Dedicated endpoint:
POST /episode_end from Godot → Python
- Observation stops arriving (implicit — fragile)
3. How does a new episode START?
- Auto-restart after N seconds?
- Python requests restart via
POST /reset?
- User presses a key in the game window?
- Configurable:
--episodes 5 --delay-between 3
4. How are episodes numbered/tracked?
For #76's persistent memory, each episode needs an ID:
- Sequential: episode_1, episode_2, ...
- Timestamped: episode_20260223_143022
- Both Godot and Python must agree on the current episode ID
5. Can episodes chain automatically?
For the learning progression demo (#76), users want to run 5+ episodes and watch improvement:
python run.py --scenario foraging --episodes 5
This requires: auto-restart, episode boundary signaling, and Python-side orchestration.
Proposed Protocol
Episode State Machine
WAITING → RUNNING → ENDED → (auto) WAITING
| | |
| tick_advanced |
| observations |
| tool_calls |
| | |
| end_condition |
| met |
| | |
| v |
| ENDED --------+--→ save persistent memory
| | generate episode summary
| v
+---- WAITING (reset scene, increment episode_id)
IPC Messages
Godot → Python (episode end):
Include in final observation or as a separate message:
{
"episode_ended": true,
"episode_id": 3,
"reason": "objective_complete",
"final_score": 85,
"ticks_elapsed": 247,
"metrics": {
"resources_collected": 7,
"damage_taken": 15,
"distance_traveled": 142.5,
"exploration_pct": 0.78
}
}
Python → Godot (episode control):
New endpoints or CLI args:
POST /reset — restart current scenario
POST /configure — set episode params (tick_limit, auto_restart)
--episodes N — run N episodes then quit
--tick-limit 500 — max ticks per episode
Godot-Side Changes
- Add end conditions to
base_scene_controller.gd (configurable per scenario)
- Add
episode_id tracking (increment on reset)
- Add auto-restart logic (configurable delay)
- Include episode metadata in observations
Python-Side Changes
Acceptance Criteria
Estimated Effort
1-2 days
Dependencies
Problem
With the compiled executable (#77) and cross-episode learning (#76), several episode lifecycle questions are unanswered. Currently:
EpisodeMemoryManagerFor persistent cross-episode memory (#76) to work, and for the compiled executable (#77) to feel polished, the episode lifecycle needs a clear protocol that both Godot and Python agree on.
Questions to Answer
1. How does an episode END?
Define explicit end conditions per scenario:
Currently
foraging.gdhasMAX_RESOURCES=7but no end-on-completion logic.2. How does Godot signal episode end to Python?
Options:
{"episode_complete": true, "reason": "all_collected", "score": 85}POST /episode_endfrom Godot → Python3. How does a new episode START?
POST /reset?--episodes 5 --delay-between 34. How are episodes numbered/tracked?
For #76's persistent memory, each episode needs an ID:
5. Can episodes chain automatically?
For the learning progression demo (#76), users want to run 5+ episodes and watch improvement:
This requires: auto-restart, episode boundary signaling, and Python-side orchestration.
Proposed Protocol
Episode State Machine
IPC Messages
Godot → Python (episode end):
Include in final observation or as a separate message:
{ "episode_ended": true, "episode_id": 3, "reason": "objective_complete", "final_score": 85, "ticks_elapsed": 247, "metrics": { "resources_collected": 7, "damage_taken": 15, "distance_traveled": 142.5, "exploration_pct": 0.78 } }Python → Godot (episode control):
New endpoints or CLI args:
Godot-Side Changes
base_scene_controller.gd(configurable per scenario)episode_idtracking (increment on reset)Python-Side Changes
AgentArenaclass gains episode lifecycle hooks:on_episode_start(episode_id)on_episode_end(episode_id, summary)--episodes Nflag in run.py for batch executionAcceptance Criteria
--episodes 5runs 5 consecutive episodes with automatic restartsEstimated Effort
1-2 days
Dependencies