GitHub - SCAI-JHU/AutoToM: [NeurIPS 2025 𝐒𝐩𝐨𝐭𝐥𝐢𝐠𝐡𝐭] AutoToM: Scaling Model-based Mental Inference via Automated Agent Modeling

AutoToM: Scaling Model-based Mental Inference via Automated Agent Modeling

Paper | Project Page | Video and Poster | Tweet

AutoToM is an automated agent modeling method for scalable, robust, and interpretable mental inference. It achieves SOTA on five benchmarks, produces human-like confidence estimates, and supports embodied decision-making.

Example Usage

To run AutoToM on MMToM-QA, with the default settings of reduced hypotheses and backwards inference:

python ProbSolver.py --automated --dataset_name "MMToM-QA"

To run AutoToM on ToMi-1st with a specified model input:

python ProbSolver.py --dataset_name "ToMi-1st" --assigned_model "['State', 'Observation', 'Belief']"

Requirements

Install relevant packages: pip install -r requirements.txt
Set your OPENAI_API_KEY:
- On macOS and Linux: export OPENAI_API_KEY='your-api-key'
- On Windows: set OPENAI_API_KEY='your-api-key'

Experiment 1: Evaluation on ToM Benchmarks

To run AutoToM on MMToM-QA, with the default settings of reduced hypotheses and backwards inference:

cd model
python ProbSolver.py --automated --dataset_name "MMToM-QA"

Experiment 2: Evaluation on Classic Cognitive Studies

To evaluate AutoToM on the cognitive experiments (Food truck scenarios (Desire and belief inference) / Online goal inference):

cd experiment_2
cd food_truck_scenarios # or, cd online_goal_inference
python eval_AutoToM.py

The final results will be printed at the end of the evaluation.

The analysis code is in analysis.ipynb under the folder corresponding to each task.

Experiment 3: Evaluation on Embodied Assistance

To evaluate AutoToM on the embodied assistance task (Online Watch-And-Help):

git clone -b AutoToM https://github.com/ShunchiZhang/online_watch_and_help

Then follow README in the cloned repo for setup and usage.

Testing AutoToM with customized questions

Please check out playground.ipynb. Simply replace the story and choices with your customized input to see how AutoToM discover Bayesian models and conduct inverse planning!

Citation

Please cite the paper and star this repo if you find it useful, thanks!

@inproceedings{zhang2025autotom,
  title={AutoToM: Scaling Model-based Mental Inference via Automated Agent Modeling},
  author={Zhining Zhang and Chuanyang Jin and Mung Yao Jia and Shunchi Zhang and Tianmin Shu},
  booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems},
  year={2025},
  url={https://openreview.net/forum?id=oeZZusZheP}
}

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
benchmarks		benchmarks
experiment_2		experiment_2
model		model
visuals		visuals
.gitignore		.gitignore
README.md		README.md
playground.ipynb		playground.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AutoToM: Scaling Model-based Mental Inference via Automated Agent Modeling

Paper | Project Page | Video and Poster | Tweet

Example Usage

Requirements

Experiment 1: Evaluation on ToM Benchmarks

Experiment 2: Evaluation on Classic Cognitive Studies

Experiment 3: Evaluation on Embodied Assistance

Testing AutoToM with customized questions

Citation

About

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AutoToM: Scaling Model-based Mental Inference via Automated Agent Modeling

Paper | Project Page | Video and Poster | Tweet

Example Usage

Requirements

Experiment 1: Evaluation on ToM Benchmarks

Experiment 2: Evaluation on Classic Cognitive Studies

Experiment 3: Evaluation on Embodied Assistance

Testing AutoToM with customized questions

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages