Skip to content

yesen-chen/EMDSAC-ft

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EMDSAC-ft: Bridging the Gap in Offline-to-Online Reinforcement Learning through Value Distribution Learning

This repository contains the official implementation of EMDSAC-ft, a novel algorithm for offline-to-online reinforcement learning that addresses key challenges in both offline pre-training and online fine-tuning phases.

🚀 Key Features

  • Uncertainty Decoupling: Separates epistemic and aleatoric uncertainty for better offline RL performance
  • Distributional Value Learning: Captures full return distributions instead of just expected values
  • Efficient Fine-tuning: UDPE and TTRPI modules for stable online adaptation
  • State-of-the-art Performance: 14.9% average improvement over baselines, 25.8% improvement in fine-tuning

🏗️ Repository Structure

EMDSAC-ft/
├── Independent/                    # Independent training implementation
│   ├── example_train/            # Main training scripts
│   │   ├── train_ORL.py         # Offline training script
│   │   ├── train_O2O.py         # Online fine-tuning script
│   │   ├── configs/             # Configuration files
│   │   │   ├── offline/         # Offline training configs
│   │   │   └── ft/             # Fine-tuning configs
│   │   ├── networks/            # Network architectures
│   │   ├── training/            # Training utilities
│   │   └── utils/               # Utility functions
│   └── Algorithms/              # Algorithm implementations
├── Vectorized/                   # Vectorized implementation
│   ├── main.py                  # Main training script
│   ├── configs/                 # Configuration files
│   └── Algorithms/              # Algorithm implementations
├── requirements.txt              # Python dependencies
└── README.md                    # This file

📋 Requirements

  • Python 3.7+
  • PyTorch 1.9+
  • CUDA 11.0+ (optional, for GPU acceleration)

🛠️ Installation

  1. Clone the repository:
git clone https://github.com/your-username/EMDSAC-ft.git
cd EMDSAC-ft
  1. Create conda environment:
conda create -n EMDSAC python=3.8
conda activate EMDSAC
  1. Install dependencies:
pip install -r requirements.txt

🎯 Quick Start

Offline Pre-training

Independent Implementation:

conda activate EMDSAC
cd Independent/example_train
python train_ORL.py --config configs/offline/halfcheetah-medium-replay-v2.yaml

Vectorized Implementation:

conda activate EMDSAC
cd Vectorized
python main.py --config configs/halfcheetah-medium-replay-v2.yaml

Online Fine-tuning

conda activate EMDSAC
cd Independent/example_train
python train_O2O.py --config configs/ft/halfcheetah-medium-replay-v2.yaml

📊 Algorithm Overview

EMDSAC (Offline Pre-training)

Core Components:

  1. Ensemble Value Distribution Networks: Quantify epistemic uncertainty from OOD actions
  2. Distributional Value Learning: Capture aleatoric uncertainty from environmental randomness
  3. Uncertainty Decoupling: Separate epistemic and aleatoric uncertainties for better performance

EMDSAC-ft (Online Fine-tuning)

Key Innovations:

  1. Uneven Distribution of Pessimism Elimination (UDPE): Adaptive uncertainty handling
  2. True Trust Region Policy Improvement (TTRPI): Stable policy updates
  3. Seamless Offline-to-Online Transition: Maintain performance during adaptation

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages