RougeLike Reinforcement Learning Gym
I did not handwrite any of the python code in this repo. I only edited the DesignDoc.md and the files in the /docs directory. The idea is to see how far I can get with using chatGPT and the Codex VSCode extension. This is a side project from some of my interest since I never took a reinforcement learning course in school. Its inspired by Brogue, Cataclysm Dark Days Ahead, and Neural MMO.
The goal is to create a rich environment similar to a mmo where you can train an agent to fight monsters, level up, and get sweet loot. Included in this repo are a set of tools to build the maps, define custom agent policies for Deep Q Networks (DQN) and Proximal Policiy Optimization (PPO), and view the results. Currently working on finishing the initial set of tools before expanding the envirornment anymore.
Agent: a single controllable unit in the environment (for exampleagent_0), with inventory, skills, health, hunger, faction state, race, and class.Race: the base stat template for an agent (strength/dexterity/intellect and damage-resistance traits) loaded fromdata/base/agent/agent_races.json.Class: the starting role package for an agent (starting items and initial skill modifiers) loaded fromdata/base/agent/agent_classes.json.Scenario: a runnable setup file containingenv_configvalues plus an explicit list of agents (race/class/profile/network choices), typically stored indata/scenarios/.
Run from repo root:
python3 examples/minimal_run.py
python3 examples/window_demo.py
python3 examples/train_demo.py
./scripts/train_default.shtools/view_replay.py: replay viewer for*.replay.jsonfiles with playback controls, focus, zoom, and render mode switch (asciidefault, optionaltileset).tools/scenario_editor.py: GUI editor for scenario files (snapshotenv_config+ agent list). Agent creation flow is race + class selection followed by editable combined JSON before save.tools/train_launcher.py: GUI launcher for training jobs with live log streaming and basic live metrics (return,win,survival,starvation,loss,epsilon).tools/map_builder.py: generate a new terrain map, preview it in ASCII, and save/attach it to a scenario viastatic_map_pathor embeddedstatic_map_data.python3 -m train: training CLI (custom and RLlib backends) with scenario support.docs/CraftingSystem.md: crafting runtime behavior and content authoring reference.docs/ConstructionSystem.md: construction/build placement behavior and authoring reference.docs/Combat.md: combat resolution, statuses, and spells reference.- Both GUI tools include a top-bar
Settings -> Themeselector. Selected theme is shared and persisted indata/user/tool_settings.json.
Replay viewer example:
python3 tools/view_replay.py outputs/train/<run>/replays/latest_episode.replay.jsonScenario Editor example:
python3 tools/scenario_editor.py --scenario data/scenarios/all_race_class_combinationsTraining Launcher example:
python3 tools/train_launcher.pydata/base/: stable shared game data used across scenarios (tiles, items, races, classes, profiles, monsters, network defaults, curriculum defaults).data/scenarios/: scenario-specific directories. Each scenario directory contains:env_config.json: full environment config for that scenario.agents.json: list of agents used in that scenario.data/env_config.json: active environment config entrypoint; points to base data by default and may be overridden per scenario.
Open a dedicated render window (Tkinter):
from rlrlgym import EnvConfig, PettingZooParallelRLRLGym
env = PettingZooParallelRLRLGym(EnvConfig(render_enabled=True))
env.reset(seed=7)
env.open_render_window()
for _ in range(20):
env.step({"agent_0": 4, "agent_1": 4})The window includes:
Play,Pause,Stepcontrols for playback frames- fixed speed controls:
1x,2x,5x Focusselector to center on a single agentZoomslider in range0..10to zoom in/out around the selected agent- tile colors rendered in the GUI window
Rendering is optional via EnvConfig(render_enabled=False).
There is no CLI render mode.
When running Aim locally, the UI is available on localhost port 43800:
http://127.0.0.1:43800
By default training logs Aim runs to /proj/aimml. Start the UI against that repo:
.venv/bin/aim up --repo /proj/aimml --host 127.0.0.1 --port 43800The environment exposes per-agent spaces in PettingZoo Parallel style:
env.action_space(agent_id)returns a discrete integer range(0, 18)env.observation_space(agent_id)returns a dict-style shape descriptor based on the agent profile- Extended systems/skills/observation/reward write-up: docs/EnvironmentSystems.md
Each action is an integer in 0..18:
0: move north1: move south2: move west3: move east4: wait/rest5: loot6: eat7: pick up items8: equip item9: use item10: interact with environment / nearby agent11: attack12: give item to adjacent ally13: trade with adjacent ally14: revive adjacent ally15: guard adjacent ally16: leave faction17: accept pending faction invite18: defend
Observations are per-agent dictionaries and always include:
step: current environment stepalive: whether the agent is aliveprofile: profile name (for examplereward_explorer_policy_v1,reward_brawler_policy_v1)
Profile and config determine optional keys:
local_tiles: local tile window around the agent (view_width/view_heightdependent)stats:{hp, hunger, position, equipped_count}inventory: list of carried items
Example:
obs, info = env.reset(seed=7)
agent_obs = obs["agent_0"]
# agent_obs -> {"step": 0, "alive": True, "profile": "reward_explorer_policy_v1", ...}The in-repo train/ module supports:
rllibbackend (primary, recommended)custombackend (legacy in-repo trainer)
CLI:
./scripts/train_default.shOutputs include:
- RLlib metrics/checkpoints in the selected output directory
- (custom backend only)
neural_policies.jsoncheckpoint
Network architectures are defined in data/base/agent/agent_networks.json by profile name
(for example default, and optionally per-profile variants).
Install RLlib:
python3 -m pip install "ray[rllib]"Direct RLlib CLI example:
python3 -m train --backend rllib --iterations 50 --max-steps 120 --output-dir outputs/train/defaultLegacy custom backend example:
python3 -m train --backend custom --episodes 100 --max-steps 120 --output-dir outputs/train/custom --networks-path data/base/agent/agent_networks.jsonScenario-driven custom training (one NN per agent in scenario roster):
python3 -m train --backend custom --scenario-path data/scenarios/all_race_class_combinations --output-dir outputs/train/scenario_runNN capacity guard options:
--max-nn-policies <N>hard cap.--resource-guard-ram-fraction <f>RAM fraction used for estimated cap (default0.45).--resource-guard-bytes-per-param <b>memory estimate per parameter (default32).--no-resource-guarddisables the guard.
Version-controlled training scripts:
scripts/train_quick.sh(fast RLlib training)scripts/train_default.sh(default RLlib training)scripts/train_long.sh(longer RLlib training)scripts/train_full.sh(full RLlib training)
All scripts accept additional CLI overrides, for example:
./scripts/train_quick.sh --seed 3 --output-dir outputs/train/custom_runTraining metrics are also logged to Aim (when aim is installed), including dashboard-equivalent episode/iteration metrics for both custom and rllib backends.
Use --aim-experiment <name> to change the experiment, --aim-repo <path> to change repo location, and --no-aim to disable Aim logging.
python3 -m unittest discover -s tests -q- PettingZoo Parallel-style multi-agent environment with
reset(seed, options)/step(actions) - Configurable per-agent observations
- Agent profile system loaded from
data/base/agent/agent_profiles.jsonwith descriptive reward/network profile names - JSON tile schema with required
schema_versionand required tile fields - Reward shaping with interaction caps and anti-exploit penalties
- Window-only rendering with playback controls and focused zoom
- Aim-native training metrics logging for both backends
- Snapshot save/load and synchronous vectorized environment wrapper