Awesome RL

A curated list of reinforcement learning resources, including books, courses, topics, repos, websites, communities, research groups, and modern RL directions such as offline RL, world models, RLHF, and agent RL.

Note:

I keep the original structure of this repository as much as possible.

Legacy entries are preserved whenever possible, even if some are old.

Newer items are added into the existing structure instead of replacing it.

Books

English
- Reinforcement Learning: An Introduction [Book] [Code] [Preferred] [old version] [newest version]
- Algorithm of Reinforcement Learning [Official]
- OpenAI Spinning Up
- Reinforcement Learning for Sequential Decision and Optimal Control
- Dynamic programming and optimal control
- Deep-Reinforcement-Learning-Hands-On [pdf 2 edition]
- Reinforcement Learning and Optimal Control
- Reinforcement Learning: Theory and Algorithms
- Markov Decision Processes: Discrete Stochastic Dynamic Programming, by Martin Puterman.
- Neuro-Dynamic Programming, by Dimitri Bertsekas and John Tsitsiklis.
- Multi-Agent Reinforcement Learning: Foundations and Modern Approaches
- Safe Reinforcement Learning
- Probabilistic Machine Learning: Advanced Topics (useful for model-based RL and sequential decision making)
Chinese
- 动手学强化学习张伟楠
- 深度强化学习落地指南魏宁
- 深度强化学习王树森 [PDF]
- EasyRL 强化学习教程
- 强化学习实战: 强化学习在阿里的技术演进和业务创新笪庆 [PDF]
- 深度强化学习董豪 [PDF]
- 深入浅出强化学习郭宪 [PDF]
- 神经网络与深度学习邱锡鹏 [PDF]
- 机器学习周志华 [PDF]
- 统计强化学习: 现代机器学习方法杉山将 [PDF]
- 深度强化学习核心算法与应用陈世勇 [PDF]
- 深度强化学习边做边学小川雄太郎 [PDF]
- 强化学习邹伟 [PDF]
- 强化学习精要: 核心算法与TensorFlow实现冯超 [PDF]
- 强化学习入门: 从原理到实践叶强 [PDF]
- 强化学习与决策控制相关中文教材（待补充）
note: 作者均只列举第一人

Courses

UCL. Reinforcement Learning. David Silver. Difficulty: [★]
UCL. Advanced Topics. David Silver.
Tencent. Reinforcement Learning. MoFan. Difficulty: [★]
National Taiwan University. DRL. Hung-Yi LEE. [Preferred]. Difficulty: [★]
Deep Reinforcement Learning. Shusen Wang. [Bilibili]
UCLA. Intro to Reinforcement Learning. Bolei Zhou. Difficulty: [★]
UC Berkeley CS294 (before), CS285 Sergey Levine
Stanford CS234 RL Emma Brunskill [Bilibili] [Official]
MIT RL Dimitri Bertsekas
RL and control THU
CMU Deep Reinforcement Learning Katerina Fragkiadaki [Link]
Udacity
Lex Fridman
ETHz Dynamic Programming and Optimal Control Raffaello D'Andrea
Pieter Abbeel
高级机器学习唐杰
李升波
UIUC, CS 542, CS 443, Nan Jiang.
R. Srikant. UIUC ECE 586.
Ron Parr. Duke CompSci 590.2.
Ben Van Roy. Stanford MS&E 338.
Ambuj Tewari and Susan Murphy. U Michigan STATS 710.
Susan Murphy. Harvard Stat 234.
Alekh Agarwal and Alex Slivkins. Columbia COMS E6998.001.
Daniel Russo. Columbia B9140-001.
Shipra Agrawal. Columbia IEOR 8100.
Emma Brunskill CMU 15-889e.
Philip Thomas. U Mass CMPSCI 687.
Michael Littman. Brown CSCI2951-F.
NJU. IntroRL. Yang Yu.
CMU 16 745
CSE 691 asu
UCLA, Reinforcement Learning of Large Language Models, Spring 2025 Ernest K. Ryu
Berkeley CS285 Deep RL
OpenAI Spinning Up Education

RL Research Topics

Approximate Dynamic Programming and Offline RL

Approximate Dynamic Programming (ADP) concerns obtaining approximate solutions to large planning problems, often with the help of sampling and function approximation. Many ADP methods can be considered as prototype algorithms for popular value-based RL algorithms used today, especially in the offline setting, so it is important to understand their behaviors and guarantees.
- Online + Offline (Hybrid)
  - Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning
  - Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data
  - Hybrid rl: Using both offline and online data can make rl efficient
- Offline RL
  - Batch-constrained learning / support constraint
  - Conservative value learning
  - Sequence modeling for decision making
  - Off-policy evaluation and selection under dataset shift
Multi-agent RL
- Cooperative MARL
- Mixed cooperative-competitive learning
- Value decomposition and factorization
- Emergent communication and social learning
- Population-based training and league systems
Off-policy Evaluation

How to estimate the performance of a policy using data collected from a different policy? This question has important implications in safety and real-world applications of RL.
Model-based RL / World Models
- Latent dynamics modeling
- Planning with learned models
- PETS / MBPO / Dreamer / MuZero / TD-MPC line
- Simulators, control, and long-horizon rollout stability
Policy Optimization
- Policy gradient / actor-critic
- PPO / TRPO / natural policy gradient
- Entropy regularization and trust region methods
- Credit assignment and variance reduction
Distributional RL
- Return distribution modeling
- Risk-sensitive RL
- Quantile-based value estimation
Safe RL / Constrained RL
- Constraint satisfaction under uncertainty
- Shielding, recovery, and risk control
- Safe exploration
Hierarchical RL
- Options framework
- Skill discovery
- Temporal abstraction
Exploration
- Intrinsic motivation
- Curiosity-driven exploration
- Count-based and uncertainty-aware methods
RL for Robotics and Control
- Locomotion and manipulation
- Sim2real
- MPC + RL
- Industrial control and autonomous driving
RL for Recommender Systems / Ads / Operations Research
- Long-term user value
- Slate recommendation
- Dynamic pricing, scheduling, routing, resource allocation
RLHF / Post-training / Agent RL
- RLHF / RLAIF / constitutional preferences
- DPO / IPO / ORPO / preference optimization family
- Verifier-based RL
- Code, math, reasoning, and tool-use RL
- Web agents, environment interaction, and long-horizon agent training
RL Theory
- Sample complexity
- Regret minimization
- Bellman rank, function approximation, and generalization
- Partial observability and identifiability

GitHub Repo

Website

Activity

Tencent AIArena
AWS DeepRacer
MineRL Competition
NetHack Challenge
Kaggle / KDD / RecSys related sequential decision competitions

Application

Quant
Gaming
Optimization
Recommendation
LLMs
Distribution
Robotics
Autonomous Driving
Recommender System / Ads
Operations Research
Alignment / Agent / Tool Use

Community

Conference & Journal

Conference: NIPS, ICML, ICLR, AAAI, IJCAI, AAMAS, IROS, CoRL, RSS, etc.

Journal: JMLR, JAIR, JAAMAS, TMLR, etc.

Research Group

Asia
- CASIA
  - Haifeng Zhang [Homepage] [Group]
  - Zhiqiang Pu [Homepage]
  - Dongbin Zhao [Homepage]
  - Junliang Xing [Homepage]
- NJU
  - Yang Yu - Interested in [Homepage]
  - Yinghuan Shi [Homepage]
  - Yang Gao [Homepage]
  - Zongzhang Zhang [Homepage]
  - NJU SME Faculty
- SJTU
  - Yong Yu [Homepage]
  - Weinan Zhang [Homepage]
  - Kai Yu [Homepage]
  - Ying Wen [Homepage]
- PKU
  - Yaodong Yang [Homepage]
  - Zhihua Zhang [Homepage]
  - Zongqing Lu [Homepage]
  - Hao Dong [Homepage]
- THU
  - Chongjie Zhang [Homepage]
  - Yi Wu [Homepage] [Group]
  - Zhihua Zhang [Homepage]
  - Shengbo Li [Homepage] [Group]
- USTC
  - Feng Wu [Homepage]
  - Houqiang Li [Homepage]
- CUHK-SZ
  - Baoxiang Wang [Homepage]
  - Hongyuan Zha [Homepage]
- CUHK
  - Baoxiang Wang [Homepage]
- TJU
  - Jianye Hao [Homepage] [Group]
- SIAT
  - Yunduan Cui [Homepage]
- HIT-SZ
  - Yanjie Li [Homepage]
- NTU
  - Bo An [Homepage]
- NUDT
  - Xin Xv
- SYSU
  - Chao Yu [Homepage]
North America
- Mcgill
  - Doina Precup [Homepage]
  - Joelle Pineau [Homepage]
- Alberta
  - Michael Bowling [Homepage]
  - Richard Sutton [Homepage]
  - Martha White [Homepage]
  - Adam White [Homepage]
- UCLA
  - Bolei Zhou [Homepage]
- MIT
  - Pulkit Agrawal [Homepage]
  - Leslie Kaelbling [Homepage]
  - Russ Tedrake [Homepage]
  - Nicholas Roy [Homepage]
- CMU
  - Geoffrey Gordon [Homepage]
  - Emma Brunskill [Homepage]
  - Jeff Schneider
  - Andrew Moore
  - Jessica K. Hodgins
  - Wen Sun [Homepage]
- Berkeley
  - Sergey Levine [Homepage]
  - Michael Jordan
  - Pieter Abbeel [Homepage] [Group]
  - Dimitri Bertsekas [Homepage]
  - Emma Brunskill [Homepage]
  - Chelsea Finn [Homepage]
  - Anca Dragan [Homepage]
  - Ken Goldberg [Homepage]
  - Stuart Russell [Homepage]
- Standford
  - Benjamin Van Roy [Homepage]
  - Emma Brunskill [Homepage]
  - Mykel Kochenderfer [Homepage]
  - Dorsa Sadigh [Homepage]
  - Tengyu Ma [Homepage]
  - Chelsea Finn [Homepage]
  - Andrew Ng [Homepage]
- UIUC
  - Nan Jiang [Homepage]
- Duke
  - Ronald Parr [Homepage]
- Brown
  - Michael Littman [Homepage]
- Columbia
  - Daniel Russo [Homepage]
  - Shipra Agrawal
  - Alekh Agarwal [Homepage]
  - Alex Slivkins [Homepage]
- Toronto
  - Jimmy Ba [Homepage]
  - Sheila McIlraith [Homepage]
Europe
- INRIA
  - Flower Team [Homepage]
- ETH Zurich
  - Andreas Krause [Homepage]
- Oxford
  - Jakob Foerster [Homepage]
  - Shimon Whiteson [Homepage]
- Cambridge
- IC
- UCL
  - Jun Wang [Homepage]
  - David Silver [Homepage]
  - Marc Deisenroth [Homepage]

Industry Group

China
- BaiDu
  - PARL
- Tencent
- NetEase
- ByteDance
- Di Di
- BaiDu
- MSRA
- Huawei
- PINGAN
- Polixir.ai
- Inspirai
- Horizon
- Momenta
- Parametrix.ai
- Alibaba
- Kwai
Oversea
- OpenAI
- DeepMind
- Google Brain
- FAIR
- Salesforce Research

⬆ back to top

Misc

⬆ back to top

Discussion

Policy-based vs. Value-based [ZhiHu]
Philosophy of Reinforcement Learning
Offline RL vs. Online RL vs. Hybrid RL
World Models vs. Model-Free RL
RLHF / Preference Optimization / Agent RL

⬆ back to top

Contributing

This is an active repository and it is time-consuming to maintain the content. So your contributions really matter!

If you find it helpful, please vote for it by adding 👍.

If you have any question about this list, do not hesitate to contact me at 1546631808@qq.com.

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
doc		doc
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Awesome RL

Contents

Books

Courses

RL Research Topics

Approximate Dynamic Programming and Offline RL

Online + Offline (Hybrid)

Offline RL

Multi-agent RL

Off-policy Evaluation

Model-based RL / World Models

Policy Optimization

Distributional RL

Safe RL / Constrained RL

Hierarchical RL

Exploration

RL for Robotics and Control

RL for Recommender Systems / Ads / Operations Research

RLHF / Post-training / Agent RL

RL Theory

GitHub Repo

Website

Activity

Application

Community

Conference & Journal

Research Group

Other outer link

Industry Group

Misc

Discussion

Contributing

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages