In this jupyter notebook , you will compare the performance of three reinforcement learning algorithms - On-Policy First-Visit Monte-Carlo Control, Sarsa, and Q-Learning - in a simple racetrack environment. You will then implement a modified TD agent that improves upon the learning performance of a basic Q-Learning agent.
abhaybodhe/Racetrack-Environment
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|