This project implements a Random Search-based hyperparameter-tuning strategy to solve the "LunarLander-v3" environment from Gynasium. It selects best 15 configurations yielded +240 rewards and performs comparative analysis on them.
- Deep Q-Network (DQN) for LunarLander-v3
The goal of this project is to train an agent using DQN algorithm to land the spacecraft safely and as fuel-efficiently as possible on a designated landing pad in the "LunarLander-v3" environment. The environment is a classical problem based on Box2D physics, where the agent must optimize its thrusters to achieve stable landing.
Ensure you have Python installed. Project is tested with Python version 3.11.6, you can set up the required dependencies using:
# Clone the repository
git clone https://github.com/GokayGulsoy/Random-Search-Based-Hyperparameter-Tuning-For-Solving-Lunar-Lander-Problem-Using-Deep-Q-Learning.git
cd Reinforcement\ Learning\ Project
# Create a virtual environment
python -m venv venv
source venv/bin/activate # On Windows use: venv\Scripts\activate
# Install required packages
pip install -r requirements.txtThe LunarLander-v3 environment is a reinforcement learning task where an agent controls a lander to safely land on a designated pad. The agent receives rewards based on its landing accuracy and penalties for fuel usage and crashes.
- Discrete (4 actions)
0: Do nothing1: Fire left orientation engine2: Fire main engine3: Fire right orientation engine
- Box(8, float32): Contains position, velocity, angle, angular velocity, and landing leg contact status.
- Reward increases for smooth and centered landing.
- Penalties for excessive tilt, high speed, unnecessary thruster use.
- +10 points per leg in contact in contact with the ground, -100 for crahsing, +100 for a successful landing.
- The lander crashed.
- The lander gets outside of the viewport.
- The lander stops moving.
To run random search-based hyperparameter-tuning, execute the command:
python tune.pyThe training progress can be monitored using Tensorboard with command:
tensorboard --logdir=runsRunning tune.py creates model files for the runs which surpasses score of 240 and saves those models under the models directory. Scores for each run are recorded under the scores directory regardless of the score achieved for plotting and analysis.
To test the trained model, set the model path for the chosen model that exceeds 240 reward under the model directory and hidden_size shold match the hidden_layer selected for the specific run which is logged to termial after running tune.py for code line record_agent_solution(model_path="models/tuning_trial_3.pth", hidden_size=128) located at the bottom of the inference.py. This logs the inference score to terminal and generate a landing simulation video under the videos directory.
python inference.pyTo visualize the reward and loss as graphs, choose the .npy file created under the scores directory after running tune.py. Then, provide the plot_training_results("tuning_trial_{trial_number}") code line located at the bottom of the graph.py with the name of the {tuning_trial_num}.npy score file excluding .npy extension, that will generate graphs showing how reward and loss changed over the episodes for that run under the solved_model_plots directory.
python graph.py- Learning Rate: [0.005, 0.001, 0.0005, 0.0001]
- Discount Factor (Gamma): [0.99, 0.95, 0.85, 0.80]
- Batch Size: [32, 64, 128]
- Hidden Size: [32, 64, 128]
- Episode: 800
- Replay Buffer Size: 100,000
- Target Network Update: Soft Updates, TAU = 0.005
- Exploration Strategy: ε-greedy (ε decays from 1.0 to 0.01 with a 0.995 decay factor per episode)
- Optimizer: Adam
- Loss Function: Mean Squared Error
- Episode: 800
- Convergence Reward: 200+ (environment considered solved)
- Graphics Processing Unit: GPU RTX 2060 (6GB VRAM)
- Processor: Intel(R) Core(TM) i7-10750H CPU @ 2.60GHz
- RAM: 16GB
NOTE: As experimental results contain many files, it is provided as zip file. Unpack it to see experimental results under the Experimental_Results directory.
15 experiments are done in total by selecting best 15 configurations found by random search-based hyperparameter-tuning. Results of each experiment with 20 trials are provided under Experimental_Reults project directory with naming convention Experimental_Results_{experiment_number}.txt. Experimental_Results directory under project root directory contains the experimental results which contains model subdirectory listing solved model files for each experiment. If directory for specific run is empty it means that no solved model was found for that experiment. Each scores_run_{experiment_number} directory contains the scores for rewards and losses for 20 tuning trials in given the specific experiment number. The solved_model_plots directory contains the {experiment_number_in_text}_run subdirectories each contais the plots for reward and loss graphs for runs that achieved 240+ reward. The videos directory contains the run_{experiment_number} folder that contains the .mp4 files for each solved model that simulates lunar lander landing.
Paper for project can be found here: Research Paper
