Multi-Agent System Process Reward Model
A lightweight process reward model that guides multi-agent reasoning at search time.
Paper · PDF · Project Page
MASPRM training pipeline (main paper figure).
- MASPRM adds a process reward model to guide multi-agent sytem.
- Plugs into MCTS and inference time search for better trajectory selection.
- Improves exact-match on challenging reasoning benchmarks.
pip install -r requirements.txt
python src/run_mcts.py --dataset mmlu --split train --load_in_4bit --ray --gpus_per_actor 0.125 --actors 32docker build -t masprm .
docker run --rm -it -v "$PWD:/app" masprm python src/run_mcts.py --help@article{yazdani2025masprm,
title={{MASPRM}: Multi-Agent System Process Reward Model},
author={Yazdani, Milad and Mostajabdaveh, Mahdi and Zhou, Zirui and Xiong, Ying},
journal={arXiv preprint arXiv:2510.24803},
year={2025}
}