Skip to content

elias0819/pi06

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Unofficial implementation of Physical Intelligence π*0.6

⚠️ Work-in-progress implementation

PyTorch implementation of RECAP based on the Physical Intelligence blog post: π*0.6: a VLA that Learns from Experience

Overview

RECAP: RL with Experience & Corrections via Advantage-conditioned Policies. RECAP implements a three-stage training process for Vision-Language-Action (VLA) models:

  1. Demonstrations: Supervised learning from expert demonstrations
  2. Corrections: Learning from expert interventions when the robot makes mistakes
  3. Autonomous Experience: Reinforcement learning with advantage-conditioned policies

The key innovation is using a value function for credit assignment and conditioning the policy on advantage values, enabling the model to learn from both good and bad experiences.

What it contains

  • Lerobot dataset v2.1 format support
  • Configurable HuggingFace tokenizers for text
  • Configurable CLIP ViT tokenizers for vision
  • Advantage-conditioned policy training
  • Value function for credit assignment
  • Three-stage training pipeline

Installation

The following steps have been tested with CUDA Version: 12.4.

  1. Clone this repository and navigate to pi06 directory:

    git clone https://github.com/nahidalam/pi06
    cd pi06
  2. Install Package:

    conda create -n pi06 python=3.11 -y
    conda activate pi06
    pip install --upgrade pip  # enable PEP 660 support
    pip install -e .
  3. Install additional packages for training (optional):

    pip install -e ".[train]"

Usage

Training

  1. Prepare your Lerobot v2.1 dataset (or use an existing one)

    The dataset should follow the Lerobot v2.1 format:

    <dataset_name>/
    ├── data/chunk-000/episode_*.parquet
    ├── videos/chunk-000/observation.images.*/episode_*.mp4
    └── meta/episodes.jsonl
    
  2. Create/edit the config file (src/pi06/configs/recap_config.yaml):

    dataset:
      path: "path/to/lerobot/dataset"  # Path to dataset root directory
      batch_size: 1
      chunk_id: "chunk-000"  # Chunk identifier
      image_keys: ["observation.images.main"]  # Camera keys
    
    model:
      action_dim: 7  # Adjust for your robot
    
    training:
      demo_epochs: 10
      correction_epochs: 5
      autonomous_epochs: 20
  3. Run training:

    python -m pi06.train --config src/pi06/configs/recap_config.yaml

Architecture

VLA Model

  • Vision Encoder: CLIP ViT (configurable)
  • Language Encoder: HuggingFace transformer (configurable)
  • Fusion Layer: Combines vision and language features
  • Action Expert: MLP head for action prediction
  • Conditioning: Supports advantage conditioning

Value Function

  • Predicts expected future return from state features
  • Used for credit assignment via GAE (Generalized Advantage Estimation)
  • Enables learning from both good and bad experiences

Training Stages

  1. Demonstrations: Supervised learning to match expert actions
  2. Corrections: Learn recovery strategies from expert interventions
  3. Autonomous: RL training with advantage-conditioned policy

Configuration

Key configuration options in recap_config.yaml:

  • model.action_dim: Dimension of action space
  • model.vision_model_name: CLIP model for vision
  • model.text_model_name: HuggingFace model for text
  • training.gamma: Discount factor for RL
  • training.lambda: GAE lambda parameter
  • training.value_loss_weight: Weight for value function loss
  • training.policy_loss_weight: Weight for policy loss

Logging

Metrics are logged to WandB:

  • train/demo_loss: Demonstration training loss
  • train/policy_loss: Policy loss (autonomous training)
  • train/value_loss: Value function loss
  • train/advantage_mean: Mean advantage values
  • train/entropy: Policy entropy

Checkpoints

Checkpoints are saved after each training stage:

  • checkpoint_demo.pt: After demonstration training
  • checkpoint_correction.pt: After correction training
  • checkpoint_final.pt: After autonomous training

Resume training with:

python -m pi06.train --config src/pi06/configs/recap_config.yaml --checkpoint checkpoints/checkpoint_demo.pt

Dataset Format

The implementation expects Lerobot v2.1 format with the following structure:

  • Data: Episode data stored as parquet files in data/chunk-*/episode_*.parquet
  • Videos: MP4 video files in videos/chunk-*/observation.images.*/episode_*.mp4
  • Metadata: Episode metadata in meta/episodes.jsonl
  • Supports: Multiple camera views, episode type filtering (demo/correction/autonomous)

See Lerobot documentation for details on the v2.1 format.

References

License

This project is not affiliated with Physical Intelligence and is provided as-is for research purposes.

About

unofficial implementation of physical intelligence pi06

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 100.0%