Details on each game's implementation, observation space, action space, and difficulty tuning.
Core: code/games/snake_core.py
Config: code/conf/game/snake.yaml
Robustness: code/conf/robustness/snake_*.yaml
Tile-based snake on a grid. The snake moves at a configurable tile rate (fps), which increases as it eats apples (fps_per_apple). Episode ends on wall collision, self-collision, or apple-inactivity timeout (~2 minute at base speed).
A local grid window centred on the snake's head, plus global context:
(2 * half_size + 1)² grid cells + 6 scalars
└─ default half_size=3 → 7×7 = 49 cells
Scalars: direction (x, y), food direction (x, y), speed, length
Total: 55 floats (Box, [-10, 10])
Discrete(4) UP, RIGHT, DOWN, LEFT (180° reversal is ignored)
Base reward: +1.0 per apple eaten. Shaped further by the active persona.
| Parameter | Default | Hard |
|---|---|---|
width × height |
720 × 480 | 700 × 500 |
cell |
10 | 10 |
fps (base tile rate) |
16 | 20 |
fps_per_apple |
0.5 | 0.75 |
max_fps |
26 | 40 |
penalty_range |
1 | 2 |
penalty_max |
0.01 | 0.02 |
Snake can stall indefinitely by avoiding the apple. The wrapper applies an apple-inactivity timeout: if no apple is eaten within 960 tile steps (~1 minute at base speed), the episode is truncated. This is tracked via steps_since_apple in the info dict.
Core: code/games/flappy_core.py
Config: code/conf/game/flappy.yaml
Robustness: code/conf/robustness/flappy_*.yaml
Side-scrolling bird avoids pipe gaps. Episode ends immediately on collision with a pipe or the ground/ceiling. No time limit needed the game always terminates.
Continuous values describing the bird's state relative to upcoming pipes (position, velocity, gap location, distance).
Discrete(2) 0 = do nothing, 1 = flap
Base reward: +1.0 per pipe cleared. Shaped further by the active persona.
Core: code/games/pong_core.py
Config: code/conf/game/pong.yaml
Robustness: code/conf/robustness/pong_*.yaml
Classic single-player Pong against a rule-based opponent. Episode ends when the ball goes out of bounds. Ball physics naturally resolve every rally no time limit needed.
Continuous: ball position/velocity, paddle positions.
Discrete(3) stay, up, down
Based on rally outcome (scoring/conceding).
Core: code/games/dk_core.py
Config: code/conf/game/dk.yaml
Robustness: code/conf/robustness/dk_*.yaml
Mario-style platformer. Player climbs ladders and avoids rolling barrels to reach the top. Episode ends on contact with a barrel or a fall. Multiple level layouts are supported via level_id.
Controlled by obs_mode:
statestructured vector (player pos, barrel positions, level info)pixeldownscaled pixel grid (obs_scalecontrols resolution)
Discrete move left/right, climb up/down, jump
Progress-based: reward for climbing higher, penalty for being hit.
| Parameter | Default |
|---|---|
level_id |
1 |
obs_scale |
16 |
obs_mode |
state |
-
Implement
code/games/<game>_core.pywith:get_action_space()→ Gym spaceget_observation_space()→ Gym spacereset(seed=None)→ obsstep(action, dt=None)→(obs, base_reward, terminated, info)render(surface, blit_only=False)WIDTH,HEIGHTclass attributes
-
Add
code/conf/game/<game>.yamlwith_target_: code.games.<game>_core.<GameClass> -
Add three robustness configs:
code/conf/robustness/<game>_default/easy/hard.yaml -
Add a persona:
code/conf/reward/<game>_<persona>.yaml -
Add a metrics collector:
code/metrics/<game>_balance.py -
Register in
code/conf/grid.yamlundergamesandpersonas