Reinforcement Learning Maze Escape with Predator Pursuit

A reinforcement learning agent trained using Recurrent PPO learns to navigate a dynamic maze with partial observability while avoiding a predator that uses A path planning* to hunt the agent.

The system combines reinforcement learning and classical search algorithms and is visualized in a 3D PyBullet simulation.

Project Overview

This project simulates a pursuit–evasion navigation problem.

Two agents operate in the same environment:

Explorer Agent (RL)

Uses Recurrent Proximal Policy Optimization (PPO)
Learns navigation strategies
Attempts to reach the exit

Predator Agent (A*)

Uses A pathfinding*
Computes shortest path to explorer
Attempts to capture the explorer

The environment contains dynamic obstacles, forcing the RL agent to continuously adapt.

Key Features

Reinforcement Learning

Recurrent PPO (Stable-Baselines3)
Trained for 15 million steps
Partial observability (7×7 local map)

Dynamic Environment

Procedurally generated mazes
Walls change during the episode
Exit location randomized

Predator Pursuit

Predator uses A path planning*
Intelligent pursuit behavior
Real-time path visualization option

Simulation Visualization

3D environment using PyBullet
Explorer robot (R2D2 model)
Predator visualization
Dynamic maze rendering

Environment Design

Observation space:

7 × 7 local maze view
dx_exit
dy_exit
danger_signal

Where:

dx_exit, dy_exit → direction toward exit
danger_signal → 1 if predator within danger radius

Action space:

0 → Move Up
1 → Move Down
2 → Move Left
3 → Move Right
4 → Stay

Reward Function

Explorer rewards:

+100 → reach exit
-50  → captured by predator
-1   → time penalty

This encourages:

efficient navigation
predator avoidance
escape behavior

Curriculum Learning

The environment difficulty increases gradually during training.

Level 1

7×7 maze
no predator
static walls

Level 2

9×9 maze
dynamic walls

Level 3

11×11 maze
predator enabled
dynamic walls

Level 4

15×15 maze
predator enabled
dynamic walls

This curriculum helps stabilize reinforcement learning training.

Project Structure

maze-agent/
│
├── env/
│   └── maze_env.py
│
├── training/
│   └── train_agents.py
│
├── visualization/
│   ├── pybullet_visualizer.py
│   └── pybullet_visualizer1.py
│
├── utils/
│   └── maze_generator.py
│
├── agents/
│   └── agents/maze_agent_15000000_steps.zip
│
├── requirements.txt
└── README.md

Installation

Clone the repository:

git clone https://github.com/Carol-here/RL-MazeSolverAgent.git
cd RL-MazeSolverAgent

Install dependencies:

pip install -r requirements.txt

Training the Agent

To train your own agent run the below commands

Run:

python training/train_agents.py

Training uses Recurrent PPO with parallel environments.

The agent is trained for ~15 million timesteps.

Running the Simulation

If you want to see how the agent runs, you can visually see using my policy file

Launch the PyBullet visualizer:

python visualization/pybullet_visualizer1.py

Optional visualizer with predator path display:

python visualization/pybullet_visualizer.py

Simulation Behavior

Explorer Agent learns to:

explore the maze
avoid predator paths
backtrack from dead ends
navigate dynamic walls

Predator Agent:

computes shortest path to explorer
adapts to maze changes
attempts to intercept explorer

Technologies Used

Python
Gymnasium
Stable-Baselines3
SB3-Contrib (Recurrent PPO)
PyBullet
NumPy

Example Demo

The simulation shows:

R2D2 navigating a maze
A predator chasing the agent
Dynamic maze updates
Optional A* path visualization

Future Improvements

Possible extensions:

Multi-agent reinforcement learning (PPO vs PPO)
Predator trajectory prediction
Value function heatmap visualization
Larger maze environments
Performance evaluation metrics

License

This project is open-source and available under the MIT License.

Author

Developed as part of an exploration into reinforcement learning, planning algorithms, and simulation environments.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement Learning Maze Escape with Predator Pursuit

Project Overview

Key Features

Reinforcement Learning

Dynamic Environment

Predator Pursuit

Simulation Visualization

Environment Design

Reward Function

Curriculum Learning

Project Structure

Installation

Training the Agent

Running the Simulation

Simulation Behavior

Technologies Used

Example Demo

Future Improvements

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
agents		agents
training		training
utils		utils
visualization		visualization
.gitignore		.gitignore
Readme.md		Readme.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning Maze Escape with Predator Pursuit

Project Overview

Key Features

Reinforcement Learning

Dynamic Environment

Predator Pursuit

Simulation Visualization

Environment Design

Reward Function

Curriculum Learning

Project Structure

Installation

Training the Agent

Running the Simulation

Simulation Behavior

Technologies Used

Example Demo

Future Improvements

License

Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages