A Plug-and-Play Memory Module for Robotic Manipulation Policies
Haoran Han (韩浩然) — Beijing Institute of Technology
PRM (Phase-Aware Retrieval Memory) is a lightweight, plug-and-play memory augmentation module for robotic manipulation policies. It enhances existing policy backbones (ACT, Diffusion Policy) with temporal memory — enabling policies to recall past observations, track task progress, and adapt to multi-step procedures.
- 🔌 Plug-and-Play: Attaches to ACT/DP without retraining the base policy
- 🧠 Phase-Aware Retrieval: Automatically identifies task phases and retrieves relevant keyframe memories
- 🚪 Retrieval Gate: Learns when to use memory vs. rely on the base policy
- 🔗 Bilateral Shared Context: Injects memory into the policy's cross-attention layers
- 📊 Interpretable: Provides phase confidence, retrieval diagnostics, and per-phase evaluation metrics
Observation → Phase Classifier → Keyframe Memory Bank → Retrieval Gate
↓
Memory-Augmented Policy (ACT/DP)
↓
Action
# Create conda environment
conda create -n PRM python=3.10 -y
conda activate PRM
# Clone repository
git clone https://github.com/HAORAN808/PRM.git
cd PRM
# Install dependencies
bash script/_install.sh# Download benchmark tasks and assets
bash script/_download_assets.sh
bash script/_download_data.sh# Step 1: Build phase labels and extract keyframes
bash prm/preprocess/run_prm_preprocess_3tasks.sh
# Step 2: Train phase classifier
bash prm/train/run_prm_phase_train_3tasks.sh
# Step 3: Train PRM adapter (retrieval gate + context injection)
bash prm/train/run_prm_adapter_train_3tasks.sh
# Step 4: Evaluate ACT + PRM
bash prm/eval/run_act_prm_eval_3tasks.sh
# Step 5: Evaluate DP + PRM (optional)
bash prm/eval/run_dp_prm_eval_3tasks.shFor detailed PRM usage, see prm/README.md.
# Evaluate vanilla ACT
python script/eval_policy.py --config policy/ACT/deploy_policy.yml \
--overrides --task_name observe_and_pickup --task_config demo_clean \
--seed 0 --test_num 10
# Evaluate vanilla Diffusion Policy
python script/eval_policy.py --config policy/DP/deploy_policy.yml \
--overrides --task_name observe_and_pickup --task_config demo_clean \
--seed 0 --test_num 10PRM/
├── prm/ # ⭐ Our PRM module
│ ├── model/
│ │ ├── prm_adapter.py # PRM v2 (current) — retrieval gate + context injection
│ │ ├── prm_adapter_v1.py # PRM v1 — earlier version
│ │ └── visual_semantics.py # Visual feature extraction
│ ├── memory/
│ │ └── memory_bank.py # Keyframe memory bank with retrieval
│ ├── preprocess/
│ │ ├── extract_keyframes.py # Keyframe extraction from trajectories
│ │ ├── build_phase_labels.py # Pseudo phase label generation
│ │ └── build_visual_features.py # Visual feature computation
│ ├── train/
│ │ ├── train_phase_head.py # Phase classifier training
│ │ └── train_prm_adapter.py # PRM adapter training
│ ├── eval/ # Evaluation scripts (ACT/DP backends)
│ └── README.md # Detailed PRM documentation
│
├── policy/
│ ├── ACT/ # Action Chunking Transformer (baseline)
│ └── DP/ # Diffusion Policy (baseline)
│
├── envs/ # Simulation environment (RMBench)
│ ├── _base_task.py # Base task class (SAPIEN 3.0)
│ ├── *_task.py # 12 bimanual manipulation tasks
│ ├── robot/ # Bimanual ALOHA-Agilex robot control
│ └── camera/ # Multi-camera rendering
│
├── script/ # Core evaluation/data collection scripts
├── scripts/ # Experiment analysis scripts
└── task_config/ # Task configuration templates
PRM is evaluated on RMBench, a memory-dependent robotic manipulation benchmark with 12 bimanual tasks:
| Task | Description | Memory Type |
|---|---|---|
observe_and_pickup |
Observe target, then pick it up | Visual memory |
press_button |
Press buttons in sequence | Procedural memory |
rearrange_blocks |
Rearrange blocks to target positions | Spatial memory |
swap_blocks |
Swap positions of two blocks | Goal memory |
put_back_block |
Return block to original position | Spatial memory |
cover_blocks |
Cover target blocks with lids | Sequential memory |
place_block_mat |
Place blocks on a mat | Spatial memory |
battery_try |
Insert battery into holder | Precision + memory |
blocks_ranking |
Rank blocks by size | Ordering memory |
classify_blocks |
Sort blocks by color/shape | Categorization memory |
storage_blocks |
Store blocks in compartments | Spatial memory |
swap_T |
Swap T-shaped pieces | Spatial memory |
PRM is designed to be easily attached to any policy that supports cross-attention.
| Option | Description |
|---|---|
prm_use_rule_phase=true |
Use rule-based phase prediction |
prm_use_rule_phase=false |
Use learned phase head (requires training) |
prm_adapter_ckpt=<path> |
Load trained PRM adapter weights |
prm_use_image_stats=false |
Default for stable training/inference |
This project is released under the MIT License.
- Benchmark environment built upon RoboTwin 2.0
- ACT baseline from Learning Fine-Grained Bimanual Manipulation
- Diffusion Policy baseline from Diffusion Policy