🎉 Accepted at ICML 2026 🎉
"To be truly useful in daily life, robots must discern the subtle details that distinguish 'a cup' from 'my cup.'"
Existing VLA models are great at understanding generic commands ("pick up the cup"), but they fail when asked to "pick up my cup" among other similar cups.
Visual Attentive Prompting (VAP) solves this by acting as a pair of "personalized glasses" for the robot.
- See & Remember: It takes a few reference photos of your object.
- Highlight: It visually detects and highlights the target object in the robot's view.
- Act: It guides the frozen VLA model to manipulate the correct object without any expensive training or fine-tuning.
Once the environments are set up, you can run the benchmarks immediately.
Evaluate VAP on SimplerEnv with Bridge/Fractal baselines.
conda activate simpler
bash ./scripts/run_personalized_simpler_vap.shEvaluate VAP on VLABench tasks.
conda activate vlabench
bash ./scripts/run_personalized_vlabench_vap.shDeploy VAP as a server for real-world robot experiments.
conda activate vap_server
bash ./scripts/run_realworld_vap_server.shFirst, clone the repository and download the necessary data assets.
git clone https://github.com/Leesangoh/VAP.git
cd VAP
git submodule update --init --recursive
# Set Environment Variables
export VAP_HOME=$(pwd)
export HF_HOME=$HOME/.cache/huggingface # Modify if needed📥 Download Assets Please download the files below and place them in the correct directories:
- User-provided Object Photos (Unzip into
${VAP_HOME}/datasets) - Personalized-VLABench Assets (Unzip into
${VAP_HOME}/src/simulation/VLABench/VLABench/assets)
We use 4 separate conda environments to manage dependencies for different baselines. Click the tabs below to expand the installation commands.
🐍 A. Setup `openpi_torch` Env (For pi0 server)
conda create -n openpi_torch python=3.11 -y
conda activate openpi_torch
cd $VAP_HOME/src/simulation/models/openpi_torch
pip install -r requirements.txt
cd packages/openpi-client
pip install -e .
cd ../../
pip install -e .
# Replace transformers modules
cd $VAP_HOME/src/simulation/models/openpi_torch
cp -r ./src/openpi/models_pytorch/transformers_replace/* $(python -c "import transformers; print(transformers.__path__[0])")
conda deactivate🤖 B. Setup `simpler` Env (For SIMPLER Simulation)
conda create -n simpler python=3.10 -y
conda activate simpler
cd $VAP_HOME/src/simulation/SIMPLER/ManiSkill2_real2sim
pip install -e .
cd ../
pip install -e .
# Dependencies
pip install torch tensorflow==2.15.0 pandas matplotlib omegaconf mediapy websockets
pip install flax==0.5 jax==0.4.1 msgpack hydra-core einops transformers==4.56.0 torchvision bitsandbytes
# Fix JAX version
pip uninstall jaxlib -y
pip install "jaxlib==0.4.1" -i https://us-python.pkg.dev/ml-oss-artifacts-published/jax/simple/
pip install numpy==1.24.4
conda deactivate🧪 C. Setup `vlabench` Env (For VLABench Simulation)
conda create -n vlabench python=3.10 -y
conda activate vlabench
cd $VAP_HOME/src/simulation/VLABench
pip install -r requirements.txt
pip install -e .
pip install websockets msgpack torchvision
conda deactivate🖥️ D. Setup `vap_server` Env (For Real-world Server)
conda create -n vap_server python=3.10 -y
conda activate vap_server
pip install torch torchvision numpy msgpack websockets pillow requests
pip install --upgrade transformers accelerate
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.2/flash_attn-2.8.2+cu12torch2.7cxx11abiTRUE-cp310-cp310-linux_x86_64.whl
conda deactivateDownload the required model weights.
📥 Click to expand Checkpoint Download instructions
A. Paligemma
mkdir -p $HF_HOME
cd $HF_HOME
git clone https://huggingface.co/google/paligemma-3b-pt-224B. Bridge/Fractal Checkpoints
mkdir -p $HF_HOME/open-pi-zero
cd $HF_HOME/open-pi-zero
# Bridge
wget -c -O bridge_beta_step19296_2024-12-26_22-30_42.pt \
"https://huggingface.co/allenzren/open-pi-zero/resolve/main/bridge_beta_step19296_2024-12-26_22-30_42.pt?download=true"
# Fractal
wget -c -O fractal_beta_step29576_2024-12-29_13-10_42.pt \
"https://huggingface.co/allenzren/open-pi-zero/resolve/main/fractal_beta_step29576_2024-12-29_13-10_42.pt?download=true"C. VLABench Checkpoint
Download model.safetensors from the link below and place it in $HF_HOME/pi05-vlabench.
- Link: Huggingface or Google Drive
If you find this work useful in your research, please cite:
@inproceedings{lee2026bring,
title={Bring My Cup! Personalizing Vision-Language-Action Models with Visual Attentive Prompting},
author={Lee, Sangoh and Mo, Sangwoo and Han, Wook-Shin},
booktitle={Proceedings of the International Conference on Machine Learning (ICML)},
year={2026}
}







