Unofficial implementation of Physical Intelligence π*0.6

⚠️ Work-in-progress implementation

PyTorch implementation of RECAP based on the Physical Intelligence blog post: π*0.6: a VLA that Learns from Experience

Overview

RECAP: RL with Experience & Corrections via Advantage-conditioned Policies. RECAP implements a three-stage training process for Vision-Language-Action (VLA) models:

Demonstrations: Supervised learning from expert demonstrations
Corrections: Learning from expert interventions when the robot makes mistakes
Autonomous Experience: Reinforcement learning with advantage-conditioned policies

The key innovation is using a value function for credit assignment and conditioning the policy on advantage values, enabling the model to learn from both good and bad experiences.

What it contains

Lerobot dataset v2.1 format support
Configurable HuggingFace tokenizers for text
Configurable CLIP ViT tokenizers for vision
Advantage-conditioned policy training
Value function for credit assignment
Three-stage training pipeline

Installation

The following steps have been tested with CUDA Version: 12.4.

Clone this repository and navigate to pi06 directory:

git clone https://github.com/nahidalam/pi06
cd pi06

Install Package:

conda create -n pi06 python=3.11 -y
conda activate pi06
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

Install additional packages for training (optional):
```
pip install -e ".[train]"
```

Usage

Training

Prepare your Lerobot v2.1 dataset (or use an existing one)

The dataset should follow the Lerobot v2.1 format:

<dataset_name>/
├── data/chunk-000/episode_*.parquet
├── videos/chunk-000/observation.images.*/episode_*.mp4
└── meta/episodes.jsonl

Create/edit the config file (src/pi06/configs/recap_config.yaml):

dataset:
  path: "path/to/lerobot/dataset"  # Path to dataset root directory
  batch_size: 1
  chunk_id: "chunk-000"  # Chunk identifier
  image_keys: ["observation.images.main"]  # Camera keys

model:
  action_dim: 7  # Adjust for your robot

training:
  demo_epochs: 10
  correction_epochs: 5
  autonomous_epochs: 20

Run training:

python -m pi06.train --config src/pi06/configs/recap_config.yaml

Architecture

VLA Model

Vision Encoder: CLIP ViT (configurable)
Language Encoder: HuggingFace transformer (configurable)
Fusion Layer: Combines vision and language features
Action Expert: MLP head for action prediction
Conditioning: Supports advantage conditioning

Value Function

Predicts expected future return from state features
Used for credit assignment via GAE (Generalized Advantage Estimation)
Enables learning from both good and bad experiences

Training Stages

Demonstrations: Supervised learning to match expert actions
Corrections: Learn recovery strategies from expert interventions
Autonomous: RL training with advantage-conditioned policy

Configuration

Key configuration options in recap_config.yaml:

model.action_dim: Dimension of action space
model.vision_model_name: CLIP model for vision
model.text_model_name: HuggingFace model for text
training.gamma: Discount factor for RL
training.lambda: GAE lambda parameter
training.value_loss_weight: Weight for value function loss
training.policy_loss_weight: Weight for policy loss

Logging

Metrics are logged to WandB:

train/demo_loss: Demonstration training loss
train/policy_loss: Policy loss (autonomous training)
train/value_loss: Value function loss
train/advantage_mean: Mean advantage values
train/entropy: Policy entropy

Checkpoints

Checkpoints are saved after each training stage:

checkpoint_demo.pt: After demonstration training
checkpoint_correction.pt: After correction training
checkpoint_final.pt: After autonomous training

Resume training with:

python -m pi06.train --config src/pi06/configs/recap_config.yaml --checkpoint checkpoints/checkpoint_demo.pt

Dataset Format

The implementation expects Lerobot v2.1 format with the following structure:

Data: Episode data stored as parquet files in data/chunk-*/episode_*.parquet
Videos: MP4 video files in videos/chunk-*/observation.images.*/episode_*.mp4
Metadata: Episode metadata in meta/episodes.jsonl
Supports: Multiple camera views, episode type filtering (demo/correction/autonomous)

See Lerobot documentation for details on the v2.1 format.

References

License

This project is not affiliated with Physical Intelligence and is provided as-is for research purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
src/pi06		src/pi06
.gitignore		.gitignore
README.md		README.md
example_usage.py		example_usage.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unofficial implementation of Physical Intelligence π*0.6

Overview

What it contains

Installation

Usage

Training

Architecture

VLA Model

Value Function

Training Stages

Configuration

Logging

Checkpoints

Dataset Format

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Unofficial implementation of Physical Intelligence π*0.6

Overview

What it contains

Installation

Usage

Training

Architecture

VLA Model

Value Function

Training Stages

Configuration

Logging

Checkpoints

Dataset Format

References

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages