Skip to content

HyperbolicCurve/Awesome-World-Action-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

81 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤖 Awesome World Action Models

📜 A Curated List of World Action Models, Vision-Language-Action (VLA), and Embodied AI Research

Awesome Auto-updated License

Overview

This repository aims to provide a comprehensive, curated, and continuously updated list of research papers, resources, and tools related to World Action Models (WAM), Vision-Language-Action (VLA) models, and Embodied AI. The goal is to help researchers and engineers navigate the rapidly evolving field of robotics foundation models.

World Action Models are robotics policies that leverage world modeling capabilities—predicting future states—for action prediction. They represent a paradigm shift from reactive policies to predictive, world-aware decision-making.

Vision-Language-Action (VLA) models combine the rich language grounding and visual understanding of Vision-Language Models (VLMs) with action prediction, offering a scalable route toward general-purpose, language-conditioned robot policies.

Table of Contents


Comparison Methods & Baselines

📊 Click to expand baseline methods and complete paper list

Quick Reference (from experimental tables)

Category Key Baselines
VLA RT-1, RT-2, OpenVLA, Octo, π0, X-VLA, UniVLA, SmolVLA, VLANeXt
Policy Diffusion Policy, ACT, BeT, PerAct, MVP, R3M, CQL, IQL
World Model DreamerV1/V2/V3, I-JEPA, V-JEPA, DreamZero

Documentation


🆕 Latest Papers (Auto-updated)

Papers are automatically fetched daily from arXiv. Last updated: 2026-06-09

VLA

Paper Date Code
LIBERO-Occ: Evaluating and Improving Vision-Language-Action Models under Scene-Induced Occlusion via Viewpoint Imagination
Taishan Li, Jiwen Zhang et al.
2026-06-09
VeriSpace: Spatially Grounded Action Verification for Vision-Language-Action Models
Guiyu Zhao, Longteng Guo et al.
2026-06-09
Uncovering Vulnerability of Vision-Language-Action Models under Joint-Level Physical Faults
Minsoo Jo, Taeju Kwon et al.
2026-06-09
Act on What You See: Unlocking Safe Social Navigation in Vision-Language-Action Models
Qingzi Wang, Xiyang Wu et al.
2026-06-09
A Practical Recipe Towards Improving Sim-and-Real Correlation for VLA Evaluation
Shuo Wang, Hanyuan Xu et al.
2026-06-09
What Matters in Orchestrating Robot Policies: A Systematic Study of Hierarchical VLA Agents
Jiaheng Hu, Mohit Shridhar et al.
2026-06-09
Flow Control: Steering Vision-Language-Action Models with Simple Real-Time Inputs
Jonathan C. Kao, Jason Chan et al.
2026-06-08
MemoryVLA++: Temporal Modeling via Memory and Imagination in Vision-Language-Action Models
Hao Shi, Weiye Li et al.
2026-06-08
Your Model Already Knows: Attention-Guided Safety Filter for Vision-Language-Action Models
Seongbin Park, Fan Zhang et al.
2026-06-08
ProbeAct: Probe-Guided Training-Free Failure Recovery in Vision-Language-Action Models
Fan Zhang, Seongbin Park et al.
2026-06-08

World Model

Paper Date Code
HiMem-WAM: Hierarchical Memory-Gated World Action Models for Robotic Manipulation
Xiaoquan Sun, Ruijian Zhang et al.
2026-06-09
MotionWAM: Towards Foundation World Action Models for Real-Time Humanoid Loco-Manipulation
Jia Zheng, Teli Ma et al.
2026-06-08
C$^3$ache: Accelerating World Action Models with Cross Inference Chunk Cache
Weisen Zhao, Lam Nguyen et al.
2026-06-08
Dream-Tac: A Unified Tactile World Action Model for Contact-Rich Robot Manipulation
Yunfan Lou, Yifan Ye et al.
2026-06-07
FAWAM: Force-Aware World Action Models for Closed-Loop Contact-Rich Manipulation
Haotian He, Zeyu Yan et al.
2026-06-07
Light-WAM: Efficient World Action Models with State-Fusion Action Decoding
Ziang Li, Dongzhou Cheng et al.
2026-06-06
Dreaming when Necessary: Advancing World Action Models with Adaptive Multi-Modal Reasoning
Yinzhou Tang, Jingbo Xu et al.
2026-06-05
Flash-WAM: Modality-Aware Distillation for World Action Models
Arman Akbari, Ci Zhang et al.
2026-06-03
OSCAR: Omni-Embodiment Action-Conditioned World Model for Robotics
Zhuoyuan Wu, Jun Gao
2026-06-03
GeoSem-WAM: Geometry- and Semantic-Aware World Action Models
Fulong Ma, Daojie Peng et al.
2026-06-02

Policy

Paper Date Code
Efficient-WAM: A 1B-Parameter World-Action Model with Low-Cost Future Imagination
Jiajun Li, Tiecheng Guo et al.
2026-06-08
AHA-WAM:Asynchronous Horizon-Adaptive World-Action Modeling with Observation-Guided Context Routing
Jisong Cai, Long Ling et al.
2026-06-08
WAM-Nav: Asymmetric Latent World-Action Modeling for Unified Visual Navigation
Ning Yang, Yan Huang et al.
2026-06-03

Key Definitions

Vision-Language-Action (VLA) Models

VLA models are robotics policies that inherit the pretrained VLMs' rich language grounding and visual understanding abilities to offer a scalable route toward general-purpose, language-conditioned robot policies.

Key Paper: RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control | Brohan et al. (2023) | Project

World Action Models (WAM)

WAM models are robotics policies that leverage the world modeling capability (i.e., predicting future states) for action prediction.

Key Paper: DreamZero: World Action Models are Zero-shot Policies | Chen et al. (2026) | Project

Note: There is an intersection between VLA and WAM: WAMs built upon pretrained VLMs are simultaneously both VLA and WAM.


Surveys

Title Authors Year Links
Vision-Language-Action (VLA) Models: Concepts, Progress, Applications and Challenges Applied AI Research Lab 2025 arXiv
A Survey on Vision-Language-Action Models for Embodied AI Ma et al. 2024 arXiv
Foundation Models for Embodied AI Driess et al. 2024 arXiv

VLA Models

General VLA

# Paper Authors Year Links
1 AC2-VLA: Action-Context-Aware Adaptive Computation in VLA Yu et al. 2026 arXiv
2 APPLV: Adaptive Planner Parameter Learning from VLA Lu et al. 2026 arXiv
3 Act, Think or Abstain: Complexity-Aware Adaptive Inference for VLA Izzo et al. 2026 arXiv
4 AnyCamVLA: Zero-Shot Camera Adaptation for Viewpoint Robust VLA Heo et al. 2026 arXiv
5 CLARE: Continuous Learning for VLA via Adapter Routing Römer et al. 2026 arXiv
6 DIAL: Decoupling Intent and Action via Latent World Modeling for VLA Chen et al. 2026 arXiv
7 EAPruning: Adaptive Pruning with Interleaved Inference for VLA Huang et al. 2026 arXiv
8 ETA-VLA: Efficient Token Adaptation Wang et al. 2026 arXiv
9 FAVLA: Force-Adaptive Fast-Slow VLA Li et al. 2026 arXiv
10 HarvestFlex: Harvesting via VLA Policy Adaptation Zhao et al. 2026 arXiv
11 On-the-Fly VLA: VLA Adaptation via Test-Time RL Liu et al. 2026 arXiv
12 ProbeFlow: Training-Free Adaptive Flow Matching for VLA Fang et al. 2026 arXiv
13 RAFT: Adapting VLA Models via Force-aware Curriculum Zhang et al. 2026 arXiv
14 ROBOGATE: Adaptive Failure Discovery for Safe Robot Policy Kim et al. 2026 arXiv
15 SAMoE-VLA: Scene Adaptive Mixture-of-Experts VLA You et al. 2026 arXiv
16 SCALE: Self-Uncertainty Adaptive Looking for VLA Choi et al. 2026 arXiv
17 SOMA: Memory-Augmented System for VLA Robustness Li et al. 2026 arXiv
18 VGAS: Adaptive Capacity Allocation for VLA Kim et al. 2026 arXiv
19 VLA-Acceleration: Accelerate VLA through Visual Token Caching Wei et al. 2026 arXiv
20 VGAS: Value-Guided Action-Chunk Selection for VLA Xu et al. 2026 arXiv
21 VLANeXt: Recipes for Building Strong VLA Models Liu et al. 2026 arXiv Project
22 HoloBrain-0: Technical Report Horizon Robotics 2026 arXiv Project
23 FocusVLA: Focused Visual Utilization for VLA Models Zhang et al. 2026 arXiv
24 StreamingVLA: Streaming VLA with Action Flow Matching Shi et al. 2026 arXiv
25 ABot-M0: VLA with Action Manifold Learning AMAP CVLab 2026 arXiv Project
26 SimVLA: A Simple VLA Baseline for Robotic Manipulation FrontierRoBo 2026 arXiv Project
27 Lingbot-VLA: A Pragmatic VLA Foundation Model Robbyant 2026 arXiv Project
28 AC-DiT: AC-DiT: Adaptive Coordination Diffusion Transformer Chen et al. 2025 arXiv
29 U-DiT: U-DiT: U-shaped Diffusion Transformers Wu et al. 2025 arXiv
30 VLA-Adapter: VLA-Adapter: Tiny-Scale VLA Paradigm Wang et al. 2025 arXiv
31 Gemini Robotics: Bringing AI into the Physical World DeepMind 2025 arXiv Project
32 π*0.6: A VLA That Learns From Experience Black et al. 2025 arXiv Project
33 X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment VLA Zheng et al. 2025 arXiv Project
34 UniVLA: Unified Vision-Language-Action Model Wu et al. 2025 arXiv Project
35 SmolVLA: A VLA for Affordable and Efficient Robotics LeRobot 2025 arXiv Project
36 NORA: A Small Open-Sourced Generalist VLA Nandan et al. 2025 arXiv Project
37 VLA-0: Building State-of-the-Art VLAs with Zero Modification VLA0 2025 arXiv Project
38 CronusVLA: Efficient Multi-Frame VLA Li et al. 2025 arXiv Project
39 OpenVLA-OFT: Fine-Tuning VLAs: Optimizing Speed and Success Lee et al. 2025 arXiv Project
40 AsyncVLA: Asynchronous Flow Matching for VLA Jiang et al. 2025 arXiv Project
41 AVA-VLA: VLA with Active Visual Attention Li et al. 2025 arXiv
42 A-VL: Adaptive Attention for Large VLA Zhang et al. 2024 arXiv
43 ADEM-VL: Adaptive and Embedded Fusion for VLA Hao et al. 2024 arXiv
44 OpenVLA: OpenVLA: An Open-Source Vision-Language-Action Model Kim et al. 2024 arXiv Project
45 Octo: Octo: An Open-Source Generalist Robot Policy Ghosh et al. 2024 arXiv Project
46 π0: A Multimodal Autoregressive Action Model Black et al. 2024 arXiv Project
47 RT-2: Vision-Language-Action Models Brohan et al. 2023 arXiv Project
48 RT-1: Robotics Transformer for Real-World Control at Scale Brohan et al. 2022 arXiv Project
49 VL-Adapter: VL-Adapter: Parameter-Efficient Transfer Learning Sung et al. 2021 arXiv

VLA with Reasoning

# Paper Authors Year Links
1 ACoT-VLA: Action Chain-of-Thought for VLA Models AgibotTech 2026 arXiv Project
2 Fast-ThinkAct: Efficient Vision-Language-Action Reasoning Chen et al. 2026 arXiv
3 CoT-VLA: Visual Chain-of-Thought Reasoning for VLA Chen et al. 2025 arXiv
4 ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning Chen et al. 2025 arXiv
5 UniVLA: VLA with World Model Wu et al. 2025 arXiv

VLA with 3D/4D Modeling

# Paper Authors Year Links
1 3D-VLA: A 3D Vision-Language-Action Generative World Model Chen et al. 2024 arXiv
2 VoxPoser: 3D-Aware VLA Huang et al. 2023 arXiv Project

Efficient VLA

# Paper Authors Year Links
1 FASTER: Rethinking Real-Time Flow VLAs Liu et al. 2026 arXiv
2 SmolVLA: A Vision-Language-Action Model for Affordable Robotics LeRobot 2025 arXiv
3 AsyncVLA: Asynchronous Flow Matching for VLA Jiang et al. 2025 arXiv Project
4 OpenHelix: A Short Survey and Open-Source Dual-System VLA Google DeepMind 2025 arXiv
5 RTC: Running VLAs at Real-time Speed Google DeepMind 2025 arXiv
6 AVA-VLA: VLA with Active Visual Attention Li et al. 2025 arXiv

VLA with RL Fine-tuning

# Paper Authors Year Links
1 SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning Pi Team 2025 arXiv
2 OpenVLA-OFT: Fine-Tuning VLAs: Optimizing Speed and Success Lee et al. 2025 arXiv Project

WAM from Video Generation

# Paper Authors Year Links
1 DreamZero: World Action Models are Zero-shot Policies Chen et al. 2026 arXiv Project
2 DiT4DiT: Jointly Modeling Video Dynamics and Actions Ma et al. 2026 arXiv
3 Cosmos Policy: Fine-Tuning Video Models for Visuomotor Control and Planning NVIDIA 2026 arXiv Project
4 Video2Act: A Dual-System Video Diffusion Policy Sun et al. 2025 arXiv

WAM from VLMs

# Paper Authors Year Links
1 Goal-VLA: Image-Generative VLMs as Object-Centric World Models Empower VLA Lee et al. 2025 arXiv

WAM from Scratch

# Paper Authors Year Links
1 V-JEPA: Video Joint-Embedding Predictive Architecture Esser et al. 2024 arXiv
2 DreamerV3: Mastering Atari from Pixels Hafner et al. 2023 arXiv Project
3 I-JEPA: Image-based Joint-Embedding Predictive Architecture Esser et al. 2023 arXiv

Action Representations

Discrete Tokenization

# Paper Authors Year Links
1 Behavior Transformers (BeT): Multimodal Action Discretization Shafiullah et al. 2022 arXiv
2 Action Bins: Discretizing Continuous Actions Brohan et al. 2023 arXiv
3 Action Tokens: Learning Discrete Action Spaces Chi et al. 2023 arXiv

Diffusion Policies

# Paper Authors Year Links
1 Diffusion Policy: Diffusion for Robot Control Chi et al. 2023 arXiv Project
2 ACT/ALOHA: Action Chunking Transformer Zhao et al. 2023 arXiv Project
3 Flow Matching Policy: Flow Matching for Action Generation Zhou et al. 2024 arXiv Project
4 Transfusion: AR + Diffusion in One Transformer Zhou et al. 2024 arXiv

Robotics Policies

# Paper Authors Year Links
1 RT-1: Robotics Transformer for Real-World Control Brohan et al. 2022 arXiv Code
2 Diffusion Policy: Diffusion for Robot Control Chi et al. 2023 arXiv
3 ACT/ALOHA: Action Chunking Transformer Zhao et al. 2023 arXiv
4 Behavior Transformers (BeT): Multimodal Action Shafiullah et al. 2022 arXiv
5 PerAct: Behavior Primitive Discovery Nasiriany et al. 2023 arXiv

Resources

Datasets

Name Description Size Links
OXE Open X-Embodiment Dataset 500k+ episodes Project
RT-1 Dataset Real Robot Manipulation 130k episodes Project
BridgeData Robot Learning Dataset 70k episodes Project
ALOHA Bimanual Manipulation 10k+ episodes Project
AgiBot World Large-scale Robot Dataset 1M+ episodes Project
UMI Unified Manipulation Interface 15k episodes Project
DROID Dataset for Robot Imitation 80k episodes Project

Benchmarks

Name Description Links
Libero Modular Benchmark for Robot Learning Project
RLBench Real Robot Benchmark Project
ManiSkill Generalizable Manipulation Project
CALVIN Language-conditioned Manipulation Project
RoboNet Large-scale Robot Dataset Project
Metaworld Multi-task Benchmark Project

Simulation Platforms

Name Description Links
Isaac Gym NVIDIA GPU-accelerated Physics Project
Isaac Lab Robot Learning Framework Project
Gazebo Classic Robot Simulator Project
Mujoco Physics Engine Project
PyBullet Physics Simulation Project
Habitat Embodied AI Simulation Project
iGibson Interactive Gibson Environment Project

Tools & Frameworks

Name Description Links
LeRobot Hugging Face Robotics Framework Project
PyRobot Robotics Learning Framework Project
Robomimic Imitation Learning Framework Project
OmniGibson Sim2Real Platform Project
ManiSkill2 Manipulation Benchmark Project

Contributing

Contributions are welcome! This repository uses automated tools for paper discovery:

  1. Add a paper: Edit the README directly or open an issue
  2. Fix errors: Submit a PR with corrections
  3. Suggest improvements: Open an issue with your ideas

To run the paper scraper locally:

pip install -r requirements.txt
python scripts/arxiv_scraper.py --max-results 100 --days-back 180

License

This repository is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments


If you find this repository useful, please consider giving it a ⭐

About

A curated list of academic papers and resources on Vision-Language-Action (VLA) and World Action Models (WAM)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages