This project investigates how different structures of human feedback influence learning behavior in reinforcement learning agents. Using the TAMER (Training an Agent Manually via Evaluative Reinforcement) framework, we systematically compare event-based, navigation-guided, and pattern-based feedback across two classic navigation environments.
Rather than focusing on reward magnitude or feedback granularity alone, this work shows that how and when feedback is delivered plays a far more important role in shaping agent behavior, learning speed, and stability.
- Feedback structure matters more than numeric granularity.
- Navigation-guided feedback leads to faster and more stable learning than purely event-based feedback.
- Excessive or poorly timed feedback can destabilize learning.
- Different feedback strategies shape how agents behave, not just whether they succeed.
Experiments were conducted in two discrete navigation environments from Gymnasium’s Toy Text suite.
- Grid world with a hazardous cliff region
- Goal: reach the terminal state without falling
- Evaluates risk-aware navigation and path planning
- Structured pickup-and-dropoff task
- Goal: pick up a passenger and deliver them to a target location
- Evaluates long-horizon planning and illegal action avoidance
- TAMER agents trained exclusively on evaluative feedback
- Q-learning baseline trained on environment rewards
Five synthetic feedback functions were implemented:
- Binary Dense (navigation-guided)
- Binary Sparse (event-based)
- Multilevel Dense (navigation-guided)
- Multilevel Sparse (event-based)
- Pattern-Based Feedback (behavioral pattern evaluation)
Dense feedback variants provide continuous, task-aligned guidance (e.g., safe path or corridor navigation), while sparse variants focus primarily on milestone or terminal events.
Representative learning curves comparing feedback strategies:
All curves are averaged over 20 seeds; shaded region is ±1 standard error. Across both environments, navigation-guided binary and multilevel feedback consistently achieved the fastest convergence and highest stability, outperforming sparse and pattern-based feedback in early learning.
To complement the quantitative results, the following GIFs show learned agent behavior after training, highlighting how feedback structure shapes navigation strategies.
Binary Dense Feedback
The agent follows the safe upper path, closely matching human intuition about optimal risk-aware behavior.
Pattern-Based Feedback
The agent reaches the goal using a less conventional trajectory, initially moving closer to the cliff before transitioning upward.
This reflects higher-level behavioral evaluation rather than direct navigation guidance.
Binary Dense Feedback
The agent successfully completes the task but exhibits oscillatory behavior when navigating toward the passenger and destination.
Pattern-Based Feedback
The agent follows a more direct path with reduced oscillation, but this feedback strategy learned less reliably overall across runs.
Multilevel Dense Feedback
The agent successfully completes the task but exhibits oscillatory behavior when navigating toward the passenger and destination.
Multilevel Sparse Feedback
The agent repeatedly performs illegal drop-offs before eventually completing the task, illustrating how sparse milestone-based feedback can fail to guide effective navigation.
These examples illustrate that feedback strategy affects behavioral style, not just task completion.
- Install dependencies:
pip install -r requirements.txt - Run the experiment notebooks:
experiments/cliffwalking_experiments.ipynbexperiments/taxi_experiments.ipynb
Each notebook runs all feedback strategies, evaluates performance across multiple seeds, and produces plots for comparative analysis.
This project was completed as a final course project in Reinforcement Learning.
Collaboration:
This was a collaborative project. I designed and implemented the full experimental pipeline (agents, feedback functions, training loops, evaluation, and visualizations), and co-authored the project report.
A full technical report (co-authored) is available at report/final_report.pdf.
Note: the README is the most up-to-date high-level summary of the project and results.
Potential extensions include:
- Evaluating feedback strategies with real human participants
- Scaling TAMER agents to larger environments (e.g., Atari)
- Studying transfer learning from simple to complex tasks
- Exploring hybrid reward + feedback models
- Knox, W. B., & Stone, P. (2009). Training an Agent Manually via Evaluative Reinforcement.
- Sutton, R. S., & Barto, A. G. (2018). Reinforcement Learning: An Introduction.







