🤖 Awesome World Action Models
📜 A Curated List of World Action Models, Vision-Language-Action (VLA), and Embodied AI Research
This repository aims to provide a comprehensive, curated, and continuously updated list of research papers, resources, and tools related to World Action Models (WAM) , Vision-Language-Action (VLA) models, and Embodied AI . The goal is to help researchers and engineers navigate the rapidly evolving field of robotics foundation models.
World Action Models are robotics policies that leverage world modeling capabilities—predicting future states—for action prediction. They represent a paradigm shift from reactive policies to predictive, world-aware decision-making.
Vision-Language-Action (VLA) models combine the rich language grounding and visual understanding of Vision-Language Models (VLMs) with action prediction, offering a scalable route toward general-purpose, language-conditioned robot policies.
Comparison Methods & Baselines
📊 Click to expand baseline methods and complete paper list
Quick Reference (from experimental tables)
Category
Key Baselines
VLA
RT-1, RT-2, OpenVLA, Octo, π0, X-VLA, UniVLA, SmolVLA, VLANeXt
Policy
Diffusion Policy, ACT, BeT, PerAct, MVP, R3M, CQL, IQL
World Model
DreamerV1/V2/V3, I-JEPA, V-JEPA, DreamZero
🆕 Latest Papers (Auto-updated)
Papers are automatically fetched daily from arXiv. Last updated: 2026-06-09
Paper
Date
Code
LIBERO-Occ: Evaluating and Improving Vision-Language-Action Models under Scene-Induced Occlusion via Viewpoint Imagination Taishan Li, Jiwen Zhang et al.
2026-06-09
VeriSpace: Spatially Grounded Action Verification for Vision-Language-Action Models Guiyu Zhao, Longteng Guo et al.
2026-06-09
Uncovering Vulnerability of Vision-Language-Action Models under Joint-Level Physical Faults Minsoo Jo, Taeju Kwon et al.
2026-06-09
Act on What You See: Unlocking Safe Social Navigation in Vision-Language-Action Models Qingzi Wang, Xiyang Wu et al.
2026-06-09
A Practical Recipe Towards Improving Sim-and-Real Correlation for VLA Evaluation Shuo Wang, Hanyuan Xu et al.
2026-06-09
What Matters in Orchestrating Robot Policies: A Systematic Study of Hierarchical VLA Agents Jiaheng Hu, Mohit Shridhar et al.
2026-06-09
Flow Control: Steering Vision-Language-Action Models with Simple Real-Time Inputs Jonathan C. Kao, Jason Chan et al.
2026-06-08
MemoryVLA++: Temporal Modeling via Memory and Imagination in Vision-Language-Action Models Hao Shi, Weiye Li et al.
2026-06-08
Your Model Already Knows: Attention-Guided Safety Filter for Vision-Language-Action Models Seongbin Park, Fan Zhang et al.
2026-06-08
ProbeAct: Probe-Guided Training-Free Failure Recovery in Vision-Language-Action Models Fan Zhang, Seongbin Park et al.
2026-06-08
Paper
Date
Code
HiMem-WAM: Hierarchical Memory-Gated World Action Models for Robotic Manipulation Xiaoquan Sun, Ruijian Zhang et al.
2026-06-09
MotionWAM: Towards Foundation World Action Models for Real-Time Humanoid Loco-Manipulation Jia Zheng, Teli Ma et al.
2026-06-08
C$^3$ache: Accelerating World Action Models with Cross Inference Chunk Cache Weisen Zhao, Lam Nguyen et al.
2026-06-08
Dream-Tac: A Unified Tactile World Action Model for Contact-Rich Robot Manipulation Yunfan Lou, Yifan Ye et al.
2026-06-07
FAWAM: Force-Aware World Action Models for Closed-Loop Contact-Rich Manipulation Haotian He, Zeyu Yan et al.
2026-06-07
Light-WAM: Efficient World Action Models with State-Fusion Action Decoding Ziang Li, Dongzhou Cheng et al.
2026-06-06
Dreaming when Necessary: Advancing World Action Models with Adaptive Multi-Modal Reasoning Yinzhou Tang, Jingbo Xu et al.
2026-06-05
Flash-WAM: Modality-Aware Distillation for World Action Models Arman Akbari, Ci Zhang et al.
2026-06-03
OSCAR: Omni-Embodiment Action-Conditioned World Model for Robotics Zhuoyuan Wu, Jun Gao
2026-06-03
GeoSem-WAM: Geometry- and Semantic-Aware World Action Models Fulong Ma, Daojie Peng et al.
2026-06-02
Vision-Language-Action (VLA) Models
VLA models are robotics policies that inherit the pretrained VLMs' rich language grounding and visual understanding abilities to offer a scalable route toward general-purpose, language-conditioned robot policies.
Key Paper: RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control | Brohan et al. (2023) | Project
World Action Models (WAM)
WAM models are robotics policies that leverage the world modeling capability (i.e., predicting future states) for action prediction.
Key Paper: DreamZero: World Action Models are Zero-shot Policies | Chen et al. (2026) | Project
Note: There is an intersection between VLA and WAM: WAMs built upon pretrained VLMs are simultaneously both VLA and WAM.
Title
Authors
Year
Links
Vision-Language-Action (VLA) Models: Concepts, Progress, Applications and Challenges
Applied AI Research Lab
2025
arXiv
A Survey on Vision-Language-Action Models for Embodied AI
Ma et al.
2024
arXiv
Foundation Models for Embodied AI
Driess et al.
2024
arXiv
#
Paper
Authors
Year
Links
1
AC2-VLA : Action-Context-Aware Adaptive Computation in VLA
Yu et al.
2026
arXiv
2
APPLV : Adaptive Planner Parameter Learning from VLA
Lu et al.
2026
arXiv
3
Act, Think or Abstain : Complexity-Aware Adaptive Inference for VLA
Izzo et al.
2026
arXiv
4
AnyCamVLA : Zero-Shot Camera Adaptation for Viewpoint Robust VLA
Heo et al.
2026
arXiv
5
CLARE : Continuous Learning for VLA via Adapter Routing
Römer et al.
2026
arXiv
6
DIAL : Decoupling Intent and Action via Latent World Modeling for VLA
Chen et al.
2026
arXiv
7
EAPruning : Adaptive Pruning with Interleaved Inference for VLA
Huang et al.
2026
arXiv
8
ETA-VLA : Efficient Token Adaptation
Wang et al.
2026
arXiv
9
FAVLA : Force-Adaptive Fast-Slow VLA
Li et al.
2026
arXiv
10
HarvestFlex : Harvesting via VLA Policy Adaptation
Zhao et al.
2026
arXiv
11
On-the-Fly VLA : VLA Adaptation via Test-Time RL
Liu et al.
2026
arXiv
12
ProbeFlow : Training-Free Adaptive Flow Matching for VLA
Fang et al.
2026
arXiv
13
RAFT : Adapting VLA Models via Force-aware Curriculum
Zhang et al.
2026
arXiv
14
ROBOGATE : Adaptive Failure Discovery for Safe Robot Policy
Kim et al.
2026
arXiv
15
SAMoE-VLA : Scene Adaptive Mixture-of-Experts VLA
You et al.
2026
arXiv
16
SCALE : Self-Uncertainty Adaptive Looking for VLA
Choi et al.
2026
arXiv
17
SOMA : Memory-Augmented System for VLA Robustness
Li et al.
2026
arXiv
18
VGAS : Adaptive Capacity Allocation for VLA
Kim et al.
2026
arXiv
19
VLA-Acceleration : Accelerate VLA through Visual Token Caching
Wei et al.
2026
arXiv
20
VGAS : Value-Guided Action-Chunk Selection for VLA
Xu et al.
2026
arXiv
21
VLANeXt : Recipes for Building Strong VLA Models
Liu et al.
2026
arXiv Project
22
HoloBrain-0 : Technical Report
Horizon Robotics
2026
arXiv Project
23
FocusVLA : Focused Visual Utilization for VLA Models
Zhang et al.
2026
arXiv
24
StreamingVLA : Streaming VLA with Action Flow Matching
Shi et al.
2026
arXiv
25
ABot-M0 : VLA with Action Manifold Learning
AMAP CVLab
2026
arXiv Project
26
SimVLA : A Simple VLA Baseline for Robotic Manipulation
FrontierRoBo
2026
arXiv Project
27
Lingbot-VLA : A Pragmatic VLA Foundation Model
Robbyant
2026
arXiv Project
28
AC-DiT : AC-DiT: Adaptive Coordination Diffusion Transformer
Chen et al.
2025
arXiv
29
U-DiT : U-DiT: U-shaped Diffusion Transformers
Wu et al.
2025
arXiv
30
VLA-Adapter : VLA-Adapter: Tiny-Scale VLA Paradigm
Wang et al.
2025
arXiv
31
Gemini Robotics : Bringing AI into the Physical World
DeepMind
2025
arXiv Project
32
π*0.6 : A VLA That Learns From Experience
Black et al.
2025
arXiv Project
33
X-VLA : Soft-Prompted Transformer as Scalable Cross-Embodiment VLA
Zheng et al.
2025
arXiv Project
34
UniVLA : Unified Vision-Language-Action Model
Wu et al.
2025
arXiv Project
35
SmolVLA : A VLA for Affordable and Efficient Robotics
LeRobot
2025
arXiv Project
36
NORA : A Small Open-Sourced Generalist VLA
Nandan et al.
2025
arXiv Project
37
VLA-0 : Building State-of-the-Art VLAs with Zero Modification
VLA0
2025
arXiv Project
38
CronusVLA : Efficient Multi-Frame VLA
Li et al.
2025
arXiv Project
39
OpenVLA-OFT : Fine-Tuning VLAs: Optimizing Speed and Success
Lee et al.
2025
arXiv Project
40
AsyncVLA : Asynchronous Flow Matching for VLA
Jiang et al.
2025
arXiv Project
41
AVA-VLA : VLA with Active Visual Attention
Li et al.
2025
arXiv
42
A-VL : Adaptive Attention for Large VLA
Zhang et al.
2024
arXiv
43
ADEM-VL : Adaptive and Embedded Fusion for VLA
Hao et al.
2024
arXiv
44
OpenVLA : OpenVLA: An Open-Source Vision-Language-Action Model
Kim et al.
2024
arXiv Project
45
Octo : Octo: An Open-Source Generalist Robot Policy
Ghosh et al.
2024
arXiv Project
46
π0 : A Multimodal Autoregressive Action Model
Black et al.
2024
arXiv Project
47
RT-2 : Vision-Language-Action Models
Brohan et al.
2023
arXiv Project
48
RT-1 : Robotics Transformer for Real-World Control at Scale
Brohan et al.
2022
arXiv Project
49
VL-Adapter : VL-Adapter: Parameter-Efficient Transfer Learning
Sung et al.
2021
arXiv
#
Paper
Authors
Year
Links
1
ACoT-VLA : Action Chain-of-Thought for VLA Models
AgibotTech
2026
arXiv Project
2
Fast-ThinkAct : Efficient Vision-Language-Action Reasoning
Chen et al.
2026
arXiv
3
CoT-VLA : Visual Chain-of-Thought Reasoning for VLA
Chen et al.
2025
arXiv
4
ThinkAct : Vision-Language-Action Reasoning via Reinforced Visual Latent Planning
Chen et al.
2025
arXiv
5
UniVLA : VLA with World Model
Wu et al.
2025
arXiv
#
Paper
Authors
Year
Links
1
3D-VLA : A 3D Vision-Language-Action Generative World Model
Chen et al.
2024
arXiv
2
VoxPoser : 3D-Aware VLA
Huang et al.
2023
arXiv Project
#
Paper
Authors
Year
Links
1
FASTER : Rethinking Real-Time Flow VLAs
Liu et al.
2026
arXiv
2
SmolVLA : A Vision-Language-Action Model for Affordable Robotics
LeRobot
2025
arXiv
3
AsyncVLA : Asynchronous Flow Matching for VLA
Jiang et al.
2025
arXiv Project
4
OpenHelix : A Short Survey and Open-Source Dual-System VLA
Google DeepMind
2025
arXiv
5
RTC : Running VLAs at Real-time Speed
Google DeepMind
2025
arXiv
6
AVA-VLA : VLA with Active Visual Attention
Li et al.
2025
arXiv
#
Paper
Authors
Year
Links
1
SimpleVLA-RL : Scaling VLA Training via Reinforcement Learning
Pi Team
2025
arXiv
2
OpenVLA-OFT : Fine-Tuning VLAs: Optimizing Speed and Success
Lee et al.
2025
arXiv Project
WAM from Video Generation
#
Paper
Authors
Year
Links
1
DreamZero : World Action Models are Zero-shot Policies
Chen et al.
2026
arXiv Project
2
DiT4DiT : Jointly Modeling Video Dynamics and Actions
Ma et al.
2026
arXiv
3
Cosmos Policy : Fine-Tuning Video Models for Visuomotor Control and Planning
NVIDIA
2026
arXiv Project
4
Video2Act : A Dual-System Video Diffusion Policy
Sun et al.
2025
arXiv
#
Paper
Authors
Year
Links
1
Goal-VLA : Image-Generative VLMs as Object-Centric World Models Empower VLA
Lee et al.
2025
arXiv
#
Paper
Authors
Year
Links
1
V-JEPA : Video Joint-Embedding Predictive Architecture
Esser et al.
2024
arXiv
2
DreamerV3 : Mastering Atari from Pixels
Hafner et al.
2023
arXiv Project
3
I-JEPA : Image-based Joint-Embedding Predictive Architecture
Esser et al.
2023
arXiv
#
Paper
Authors
Year
Links
1
Behavior Transformers (BeT) : Multimodal Action Discretization
Shafiullah et al.
2022
arXiv
2
Action Bins : Discretizing Continuous Actions
Brohan et al.
2023
arXiv
3
Action Tokens : Learning Discrete Action Spaces
Chi et al.
2023
arXiv
#
Paper
Authors
Year
Links
1
Diffusion Policy : Diffusion for Robot Control
Chi et al.
2023
arXiv Project
2
ACT/ALOHA : Action Chunking Transformer
Zhao et al.
2023
arXiv Project
3
Flow Matching Policy : Flow Matching for Action Generation
Zhou et al.
2024
arXiv Project
4
Transfusion : AR + Diffusion in One Transformer
Zhou et al.
2024
arXiv
#
Paper
Authors
Year
Links
1
RT-1 : Robotics Transformer for Real-World Control
Brohan et al.
2022
arXiv Code
2
Diffusion Policy : Diffusion for Robot Control
Chi et al.
2023
arXiv
3
ACT/ALOHA : Action Chunking Transformer
Zhao et al.
2023
arXiv
4
Behavior Transformers (BeT) : Multimodal Action
Shafiullah et al.
2022
arXiv
5
PerAct : Behavior Primitive Discovery
Nasiriany et al.
2023
arXiv
Name
Description
Size
Links
OXE
Open X-Embodiment Dataset
500k+ episodes
Project
RT-1 Dataset
Real Robot Manipulation
130k episodes
Project
BridgeData
Robot Learning Dataset
70k episodes
Project
ALOHA
Bimanual Manipulation
10k+ episodes
Project
AgiBot World
Large-scale Robot Dataset
1M+ episodes
Project
UMI
Unified Manipulation Interface
15k episodes
Project
DROID
Dataset for Robot Imitation
80k episodes
Project
Name
Description
Links
Libero
Modular Benchmark for Robot Learning
Project
RLBench
Real Robot Benchmark
Project
ManiSkill
Generalizable Manipulation
Project
CALVIN
Language-conditioned Manipulation
Project
RoboNet
Large-scale Robot Dataset
Project
Metaworld
Multi-task Benchmark
Project
Name
Description
Links
Isaac Gym
NVIDIA GPU-accelerated Physics
Project
Isaac Lab
Robot Learning Framework
Project
Gazebo
Classic Robot Simulator
Project
Mujoco
Physics Engine
Project
PyBullet
Physics Simulation
Project
Habitat
Embodied AI Simulation
Project
iGibson
Interactive Gibson Environment
Project
Name
Description
Links
LeRobot
Hugging Face Robotics Framework
Project
PyRobot
Robotics Learning Framework
Project
Robomimic
Imitation Learning Framework
Project
OmniGibson
Sim2Real Platform
Project
ManiSkill2
Manipulation Benchmark
Project
Contributions are welcome! This repository uses automated tools for paper discovery:
Add a paper : Edit the README directly or open an issue
Fix errors : Submit a PR with corrections
Suggest improvements : Open an issue with your ideas
To run the paper scraper locally:
pip install -r requirements.txt
python scripts/arxiv_scraper.py --max-results 100 --days-back 180
This repository is licensed under the MIT License - see the LICENSE file for details.
If you find this repository useful, please consider giving it a ⭐