- [2026/03/01] We release the code of TwinVLA.
- [2026/01/26] TwinVLA got accepted to ICLR 2026. 🎉
- [2025/11/10] TwinVLA is now on arXiv.
Installation | Quick Usage | Custom Dataset | RoboTwin | Tabletop-Sim | New VLM Backbones | Citation
It is recommended to use Anaconda.
# Cloning TwinVLA
git clone https://github.com/jellyho/TwinVLA.git
cd TwinVLA
# Create Conda env
conda create -n twinvla python=3.10 -y
conda activate twinvla
# For compiling rerun-sdk (LeRobot dependency)
conda install -c conda-forge rust -y
# Install Requirements and TwinVLA
pip install -r requirements.txt
# Additional installation for lerobot
# Install lerobot & downgrade numpy<2.0 (Please ignore dependency conflicts)
pip install "lerobot==0.4.0"
pip install "numpy<2.0.0"
from twinvla.model.twinvla import TwinVLA
model = TwinVLA('jellyho/TwinVLA-aloha_handover_box')
actions = model.predict_action(
unnorm_key='aloha_handover_box',
instruction=instruction,
image=front_img,
image_wrist_r=right_wrist_img,
image_wrist_l=left_wrist_img,
proprio=proprio,
)
for action in actions:
robot.excute(action)
We assume you have already uploaded your dataset in LeRobot format. Action / State dimension should be 20D: 2 × (xyz, 6d rotation, gripper).
Change the lerobot config file to match your dataset. This config file defines the key to use for training. Refer to TabletopSimConfig in lerobot config for more details.
sh train_twinvla.sh <lerobot-dataset-path> <batch-size> <num-gpu>
Since we are based on OpenVLA's RLDS dataset loader, we support fine-tuning with RLDS datasets. We also provide instructions to convert datasets in HDF5 format into the RLDS format. If you already have an RLDS dataset, you can skip to Step 2️⃣.
In this section, we explain how to convert your custom dataset, stored in HDF5 format, into RLDS format.
To convert HDF5 to RLDS, you first need to install the requirements. We recommend installing them in a new environment.
cd scripts/rlds_gen
pip install -r requirements_rlds.txt
Modify scripts/rlds_gen/rlds_builder_robotwin.py (or use it as a template) to match the key names and feature specs of your HDF5 file. Note that TwinVLA only requires a 20D EEF pose action space.
cd scripts/rlds_gen
CUDA_VISIBLE_DEVICES="" python rlds_builder_robotwin.py --task_name $dataset_name
You need to register your RLDS dataset by adding entries to the following files:
Run train_twinvla.sh. Make sure you specify the arguments correctly, including changing --output_dir for checkpoints.
sh train_twinvla.sh <task-name> <batch-size> <num-gpu>
Use the code below to download the RoboTwin data converted to RLDS.
Note that this data is simply the RoboTwin dataset converted to RLDS format to use 20D EEF pose action space.
huggingface-cli download jellyho/robotwin2_rlds --repo-type dataset --local-dir ./robotwin2_rlds
After downloading the dataset, replace the value of the --data_root_dir argument in train_twinvla.sh with the path to the downloaded robotwin2_rlds directory:
...
--data_root_dir "path/to/robotwin2_rlds" \
...
Then, you can start training by running:
sh train_twinvla.sh <task-name> <batch-size> <num-gpu>
# e.g. sh train_twinvla.sh robotwin_open_laptop 4 2
You can evaluate TwinVLA using the RoboTwin codebase with just a few simple setup steps.
git clone https://github.com/RoboTwin-Platform/RoboTwin.git --recursive
cd RoboTwin
Create and activate a new Conda environment following the RoboTwin installation guide:
After installing RoboTwin's dependencies, return to the TwinVLA project directory and install its dependencies into the same Conda environment:
pip install -r requirements.txt && pip install -e .
Copy the ./TwinVLA_robotwin folder from this project into RoboTwin's ./policy directory, and rename it to TwinVLA:
mv TwinVLA_robotwin ../RoboTwin/policy/TwinVLA
Move into the RoboTwin/policy/TwinVLA directory and run evaluation using:
bash eval.sh <ckpt-path> <task-name> <task-config> <ckpt-setting> <seed> <gpu-id>
For example, to evaluate a model fine-tuned on demo_clean for the open_laptop task, but run evaluation in the demo_randomized setup (with seed=42 and gpu_id=0):
bash eval.sh /path/to/ckpt open_laptop demo_randomized demo_clean 42 0
For more details, refer to the RoboTwin policy deploy documentation.
You can download the Tabletop-Sim dataset by running:
huggingface-cli download jellyho/tabletop-simulation-rlds --repo-type dataset --local-dir ./tabletop-simulation-rlds # Total size: 56GB
After downloading the dataset, replace the value of the --data_root_dir argument in train_twinvla.sh with the path to the downloaded tabletop-simulation-rlds directory:
...
--data_root_dir "path/to/tabletop-simulation-rlds" \
--data_mix "$data_mix" \
...
Then, you can start training by running:
sh train_twinvla.sh <task-name> <batch-size> <num-gpu>
# e.g. sh train_twinvla.sh aloha_handover_box 4 2
You can directly load the LeRobot dataset from Hugging Face by entering the dataset path.
sh train_twinvla.sh <lerobot-dataset-path> <batch-size> <num-gpu>
# e.g. sh train_twinvla.sh jellyho/aloha_handover_box 8 2
After fine-tuning TwinVLA on the Tabletop-Sim dataset, you can rollout the model in Tabletop-Sim.
To run the simulation rollout, first download and install Tabletop-Sim:
git clone https://github.com/jellyho/Tabletop-Sim.git --recursive
cd Tabletop-Sim
pip install -r requirements.txt
pip install 'numpy<2'
After installing the simulator, you can run the evaluation by executing:
sh tabletop_run.sh /path/to/checkpoint <task-name> # e.g. aloha_handover_box
You can also try using our fine-tuned model (Models) on Tabletop-Sim:
sh tabletop_run.sh jellyho/aloha_dish_drainer aloha_dish_drainer
Our version of TwinVLA is built on a modular architecture that makes it easy to experiment with different VLM backbones. You can quickly integrate new models using our automated template generator.
Since TwinVLA builds upon SingleVLA, start by creating a template for your desired backbone. Run the following command:
python3 singlevla_gen.py --model_type <YourModelName>
Example:
python3 singlevla_gen.py --model_type InternVL3_1B
This will generate a new Python file at twinvla/model/singlevlas/<your_model_name>.py (e.g., internvl3_1b.py).
Open the generated file. It contains a class definition for your new VLA model (e.g., InternVL3_1BVLA). You need to:
- Direct Imports: Update the
TODOsection to import your specific model's configuration and class (e.g., fromtransformersor a local file). - Set Pretrained Path: Fill in the
pretrained_pathin the config class with the Hugging Face model ID. - Implement Methods: Fill in the method stubs that raise
NotImplementedError. The generated docstrings provide detailed instructions for each method:init_processor_tokenizer: Initialize tokenizers/processors.text_backbone&vision_backbone: Return the underlying model components.process_image&image_embeds: Define how images are preprocessed and encoded.image_seq_len,image_start/end_token: Specify token details.
Tip: You can refer to existing implementations like twinvla/model/singlevlas/eagle2_1b.py or smolvlm2.py for guidance.
To ensure your new model is recognized by the training script, add an import statement to twinvla/model/singlevlas/__init__.py:
from .<your_model_name_lower> import *
You can now train your new VLA model by specifying its name in the --model_type argument:
# Example for SingleVLA training
accelerate launch scripts/train.py \
--model_type <YourModelName>VLA \
--output_dir checkpoints/my_new_backbone \
...
To extend this to a dual-arm TwinVLA, create a corresponding configuration in twinvla/model/twinvlas/ following the existing patterns (e.g., eagle2_1b.py in that directory) and implement any necessary attention mechanism overrides if using a non-standard architecture.
This repository leverages code from the following open-source projects:
- OpenVLA for the RLDS data loading codebase.
- DiT for the DiT policy head.
- MobileVLM-V2 for the training pipeline.
If you find this work useful, please consider citing:
@inproceedings{
im2026twinvla,
title={Twin{VLA}: Data-Efficient Bimanual Manipulation with Twin Single-Arm Vision-Language-Action Models},
author={Hokyun Im and Euijin Jeong and Andrey Kolobov and Jianlong Fu and Youngwoon Lee},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=jG9W6nAwVz}
}