A reinforcement learning agent trained using Recurrent PPO learns to navigate a dynamic maze with partial observability while avoiding a predator that uses A path planning* to hunt the agent.
The system combines reinforcement learning and classical search algorithms and is visualized in a 3D PyBullet simulation.
This project simulates a pursuit–evasion navigation problem.
Two agents operate in the same environment:
Explorer Agent (RL)
- Uses Recurrent Proximal Policy Optimization (PPO)
- Learns navigation strategies
- Attempts to reach the exit
Predator Agent (A*)
- Uses A pathfinding*
- Computes shortest path to explorer
- Attempts to capture the explorer
The environment contains dynamic obstacles, forcing the RL agent to continuously adapt.
- Recurrent PPO (Stable-Baselines3)
- Trained for 15 million steps
- Partial observability (7×7 local map)
- Procedurally generated mazes
- Walls change during the episode
- Exit location randomized
- Predator uses A path planning*
- Intelligent pursuit behavior
- Real-time path visualization option
- 3D environment using PyBullet
- Explorer robot (R2D2 model)
- Predator visualization
- Dynamic maze rendering
Observation space:
7 × 7 local maze view
dx_exit
dy_exit
danger_signal
Where:
dx_exit, dy_exit → direction toward exit
danger_signal → 1 if predator within danger radius
Action space:
0 → Move Up
1 → Move Down
2 → Move Left
3 → Move Right
4 → Stay
Explorer rewards:
+100 → reach exit
-50 → captured by predator
-1 → time penalty
This encourages:
- efficient navigation
- predator avoidance
- escape behavior
The environment difficulty increases gradually during training.
Level 1
7×7 maze
no predator
static walls
Level 2
9×9 maze
dynamic walls
Level 3
11×11 maze
predator enabled
dynamic walls
Level 4
15×15 maze
predator enabled
dynamic walls
This curriculum helps stabilize reinforcement learning training.
maze-agent/
│
├── env/
│ └── maze_env.py
│
├── training/
│ └── train_agents.py
│
├── visualization/
│ ├── pybullet_visualizer.py
│ └── pybullet_visualizer1.py
│
├── utils/
│ └── maze_generator.py
│
├── agents/
│ └── agents/maze_agent_15000000_steps.zip
│
├── requirements.txt
└── README.md
Clone the repository:
git clone https://github.com/Carol-here/RL-MazeSolverAgent.git
cd RL-MazeSolverAgent
Install dependencies:
pip install -r requirements.txt
To train your own agent run the below commands
Run:
python training/train_agents.py
Training uses Recurrent PPO with parallel environments.
The agent is trained for ~15 million timesteps.
If you want to see how the agent runs, you can visually see using my policy file
Launch the PyBullet visualizer:
python visualization/pybullet_visualizer1.py
Optional visualizer with predator path display:
python visualization/pybullet_visualizer.py
Explorer Agent learns to:
- explore the maze
- avoid predator paths
- backtrack from dead ends
- navigate dynamic walls
Predator Agent:
- computes shortest path to explorer
- adapts to maze changes
- attempts to intercept explorer
- Python
- Gymnasium
- Stable-Baselines3
- SB3-Contrib (Recurrent PPO)
- PyBullet
- NumPy
The simulation shows:
- R2D2 navigating a maze
- A predator chasing the agent
- Dynamic maze updates
- Optional A* path visualization
Possible extensions:
- Multi-agent reinforcement learning (PPO vs PPO)
- Predator trajectory prediction
- Value function heatmap visualization
- Larger maze environments
- Performance evaluation metrics
This project is open-source and available under the MIT License.
Developed as part of an exploration into reinforcement learning, planning algorithms, and simulation environments.