Skip to content

Carol-here/RL-MazeSolverAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reinforcement Learning Maze Escape with Predator Pursuit

A reinforcement learning agent trained using Recurrent PPO learns to navigate a dynamic maze with partial observability while avoiding a predator that uses A path planning* to hunt the agent.

The system combines reinforcement learning and classical search algorithms and is visualized in a 3D PyBullet simulation.


Project Overview

This project simulates a pursuit–evasion navigation problem.

Two agents operate in the same environment:

Explorer Agent (RL)

  • Uses Recurrent Proximal Policy Optimization (PPO)
  • Learns navigation strategies
  • Attempts to reach the exit

Predator Agent (A*)

  • Uses A pathfinding*
  • Computes shortest path to explorer
  • Attempts to capture the explorer

The environment contains dynamic obstacles, forcing the RL agent to continuously adapt.


Key Features

Reinforcement Learning

  • Recurrent PPO (Stable-Baselines3)
  • Trained for 15 million steps
  • Partial observability (7×7 local map)

Dynamic Environment

  • Procedurally generated mazes
  • Walls change during the episode
  • Exit location randomized

Predator Pursuit

  • Predator uses A path planning*
  • Intelligent pursuit behavior
  • Real-time path visualization option

Simulation Visualization

  • 3D environment using PyBullet
  • Explorer robot (R2D2 model)
  • Predator visualization
  • Dynamic maze rendering

Environment Design

Observation space:

7 × 7 local maze view
dx_exit
dy_exit
danger_signal

Where:

dx_exit, dy_exit → direction toward exit
danger_signal → 1 if predator within danger radius

Action space:

0 → Move Up
1 → Move Down
2 → Move Left
3 → Move Right
4 → Stay

Reward Function

Explorer rewards:

+100 → reach exit
-50  → captured by predator
-1   → time penalty

This encourages:

  • efficient navigation
  • predator avoidance
  • escape behavior

Curriculum Learning

The environment difficulty increases gradually during training.

Level 1

7×7 maze
no predator
static walls

Level 2

9×9 maze
dynamic walls

Level 3

11×11 maze
predator enabled
dynamic walls

Level 4

15×15 maze
predator enabled
dynamic walls

This curriculum helps stabilize reinforcement learning training.

Project Structure

maze-agent/
│
├── env/
│   └── maze_env.py
│
├── training/
│   └── train_agents.py
│
├── visualization/
│   ├── pybullet_visualizer.py
│   └── pybullet_visualizer1.py
│
├── utils/
│   └── maze_generator.py
│
├── agents/
│   └── agents/maze_agent_15000000_steps.zip
│
├── requirements.txt
└── README.md

Installation

Clone the repository:

git clone https://github.com/Carol-here/RL-MazeSolverAgent.git
cd RL-MazeSolverAgent

Install dependencies:

pip install -r requirements.txt

Training the Agent

To train your own agent run the below commands

Run:

python training/train_agents.py

Training uses Recurrent PPO with parallel environments.

The agent is trained for ~15 million timesteps.

Running the Simulation

If you want to see how the agent runs, you can visually see using my policy file

Launch the PyBullet visualizer:

python visualization/pybullet_visualizer1.py

Optional visualizer with predator path display:

python visualization/pybullet_visualizer.py

Simulation Behavior

Explorer Agent learns to:

  • explore the maze
  • avoid predator paths
  • backtrack from dead ends
  • navigate dynamic walls

Predator Agent:

  • computes shortest path to explorer
  • adapts to maze changes
  • attempts to intercept explorer

Technologies Used

  • Python
  • Gymnasium
  • Stable-Baselines3
  • SB3-Contrib (Recurrent PPO)
  • PyBullet
  • NumPy

Example Demo

The simulation shows:

  • R2D2 navigating a maze
  • A predator chasing the agent
  • Dynamic maze updates
  • Optional A* path visualization

Future Improvements

Possible extensions:

  • Multi-agent reinforcement learning (PPO vs PPO)
  • Predator trajectory prediction
  • Value function heatmap visualization
  • Larger maze environments
  • Performance evaluation metrics

License

This project is open-source and available under the MIT License.

Author

Developed as part of an exploration into reinforcement learning, planning algorithms, and simulation environments.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages