Embodied Agent Interface Challenge @ NeurIPS 2025

This repo contains the CtrlAct solution for the Embodied Agent Interface Challenge at NeurIPS 2025.

Challenge Overview

Official Website: https://neurips25-eai.github.io/
Benchmark Dataset: https://huggingface.co/datasets/Inevitablevalor/EmbodiedAgentInterface
- 100 tasks in BEHAVIOR
- 338 tasks in VirtualHome
Leaderboard: https://eval.ai/web/challenges/challenge-page/2621/leaderboard/6818
- 10 teams outperformed the basline model

The competition includes four main tasks:

Goal Interpretation: Understanding objectives and grounding them in environmental states.
Subgoal Decomposition: Breaking complex goals into actionable steps.
Action Sequencing: Planning coherent action sequences.
Transition Modeling: Predicting environment state changes caused by actions.

CtrlAct Solution

Technical Report: https://openreview.net/forum?id=0dt9Ho6dXA

The goal of CtrlAct is to evaluate the performance of open-source models on the Embodied Agent Interface benchmark and analyze the performance gap between these models and top-ranked systems.

The following open-source models were evaluated:

Qwen3-235B-A22B (thinking): https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507-FP8
Qwen3-30B-A3B (thinking): https://huggingface.co/Qwen/Qwen3-30B-A3B-Thinking-2507-FP8
Qwen3-Next80B-A3B (thinking): https://huggingface.co/Qwen/Qwen3-Next-80B-A3B-Thinking
gpt-oss-120B (high): https://huggingface.co/openai/gpt-oss-120b
gpt-oss-20B (high): https://huggingface.co/openai/gpt-oss-20b

vLLM Setup

conda create -n ctrlact python=3.12
conda activate ctrlact
pip install vllm==0.11.0 --extra-index-url https://download.pytorch.org/whl/cu128
pip install transformers==4.57.1
pip install flashinfer-python==0.4.1
pip install scikit-learn matplotlib pandas

Harware

4 NVIDIA H100 GPUs
8 NVIDIA L40S GPUs

SFT Using Tinker

We used Tinker for supervised fine-tuning (SFT) experiments as part of our evaluation pipeline.

Tinker cookbook: https://github.com/thinking-machines-lab/tinker-cookbook

We thank the Tinker team for providing free credits that supported our large-scale model experiments.

Changelog

2025-12-07: Technical report released.
2026-04-30: GitHub repository updated.
2026-05-03: vLLM inference code released.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.husky		.husky
code		code
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Embodied Agent Interface Challenge @ NeurIPS 2025

Challenge Overview

CtrlAct Solution

vLLM Setup

Harware

SFT Using Tinker

Changelog

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Embodied Agent Interface Challenge @ NeurIPS 2025

Challenge Overview

CtrlAct Solution

vLLM Setup

Harware

SFT Using Tinker

Changelog

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages