Skip to content

JiwenJ/Awesome-RL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

126 Commits
 
 
 
 

Repository files navigation

Awesome RL

Awesome Static Badge Static Badge Static Badge

A curated list of reinforcement learning resources, including books, courses, topics, repos, websites, communities, research groups, and modern RL directions such as offline RL, world models, RLHF, and agent RL.

Note:

  • I keep the original structure of this repository as much as possible.
  • Legacy entries are preserved whenever possible, even if some are old.
  • Newer items are added into the existing structure instead of replacing it.

Contents


Books


Courses


RL Research Topics

  • Approximate Dynamic Programming and Offline RL

    Approximate Dynamic Programming (ADP) concerns obtaining approximate solutions to large planning problems, often with the help of sampling and function approximation. Many ADP methods can be considered as prototype algorithms for popular value-based RL algorithms used today, especially in the offline setting, so it is important to understand their behaviors and guarantees.

    • Online + Offline (Hybrid)

      • Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning
      • Policy Finetuning in Reinforcement Learning via Design of Experiments using Offline Data
      • Hybrid rl: Using both offline and online data can make rl efficient
    • Offline RL

      • Batch-constrained learning / support constraint
      • Conservative value learning
      • Sequence modeling for decision making
      • Off-policy evaluation and selection under dataset shift
  • Multi-agent RL

    • Cooperative MARL
    • Mixed cooperative-competitive learning
    • Value decomposition and factorization
    • Emergent communication and social learning
    • Population-based training and league systems
  • Off-policy Evaluation

    How to estimate the performance of a policy using data collected from a different policy? This question has important implications in safety and real-world applications of RL.

  • Model-based RL / World Models

    • Latent dynamics modeling
    • Planning with learned models
    • PETS / MBPO / Dreamer / MuZero / TD-MPC line
    • Simulators, control, and long-horizon rollout stability
  • Policy Optimization

    • Policy gradient / actor-critic
    • PPO / TRPO / natural policy gradient
    • Entropy regularization and trust region methods
    • Credit assignment and variance reduction
  • Distributional RL

    • Return distribution modeling
    • Risk-sensitive RL
    • Quantile-based value estimation
  • Safe RL / Constrained RL

    • Constraint satisfaction under uncertainty
    • Shielding, recovery, and risk control
    • Safe exploration
  • Hierarchical RL

    • Options framework
    • Skill discovery
    • Temporal abstraction
  • Exploration

    • Intrinsic motivation
    • Curiosity-driven exploration
    • Count-based and uncertainty-aware methods
  • RL for Robotics and Control

    • Locomotion and manipulation
    • Sim2real
    • MPC + RL
    • Industrial control and autonomous driving
  • RL for Recommender Systems / Ads / Operations Research

    • Long-term user value
    • Slate recommendation
    • Dynamic pricing, scheduling, routing, resource allocation
  • RLHF / Post-training / Agent RL

    • RLHF / RLAIF / constitutional preferences
    • DPO / IPO / ORPO / preference optimization family
    • Verifier-based RL
    • Code, math, reasoning, and tool-use RL
    • Web agents, environment interaction, and long-horizon agent training
  • RL Theory

    • Sample complexity
    • Regret minimization
    • Bellman rank, function approximation, and generalization
    • Partial observability and identifiability

GitHub Repo


Website


Activity


Application


Community


Conference & Journal

Conference: NIPS, ICML, ICLR, AAAI, IJCAI, AAMAS, IROS, CoRL, RSS, etc.

Journal: JMLR, JAIR, JAAMAS, TMLR, etc.


Research Group

Other outer link

⬆ back to top


Industry Group

⬆ back to top


Misc

⬆ back to top


Discussion

  1. Policy-based vs. Value-based [ZhiHu]
  2. Philosophy of Reinforcement Learning
  3. Offline RL vs. Online RL vs. Hybrid RL
  4. World Models vs. Model-Free RL
  5. RLHF / Preference Optimization / Agent RL

⬆ back to top


Contributing

This is an active repository and it is time-consuming to maintain the content. So your contributions really matter!

If you find it helpful, please vote for it by adding 👍.

If you have any question about this list, do not hesitate to contact me at 1546631808@qq.com.

Preferred ways to contribute:

  • preserve the existing structure and add missing resources;
  • fix broken links;
  • add newer official references for old entries;
  • expand topic pages under ./doc/.

⬆ back to top


Reference

⬆ back to top

About

A curated list of RL resources

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors