Skip to content

Monncyann/Med-U1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Med-U1: Incentivizing Unified Medical Reasoning in LLMs via Large-scale Reinforcement Learning




How to Use?

Installation

We use python 3.11 and CUDA 12.4 for experiments, you can replace it with the corresponding version according to your actual situation.

git clone https://github.com/Monncyann/Med-U1.git
conda create -n medu1 python==3.11
conda activate medu1
cd Med-U1
pip install -e verl
pip install packaging
pip install ninja
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.3/flash_attn-2.7.3+cu12torch2.4cxx11abiFALSE-cp311-cp311-linux_x86_64.whl
pip install -e .
pip install sacrebleu
pip install nltk

Prepare Data

In this work, we use MedCalc-Bench, MediQ, EHRNoteQA, MedXpertQA and medical-o1-reasoning-SFT as in-distribution tasks, MMLU-Pro as out-of-distribution task.

You can use scripts in scripts/data to prepare your own dataset.

For training with length penalty, random token constraint would be added to each case, use --do_normal=False and --num_tokens=-1, else use --do_normal=True. In particular, we provide the data we used for experiments in scripts/data/processed_data, train_XXX.parquet for training, XXX.parquet for validation and XXX_test.parquet for test.

Example, generate data for traininng and testing Med-U1:

python scripts/data/deepscaler_dataset.py 

Train Models

You can skip this step if you want to use our pre-trained models.

You can run scripts in scripts/train to train your own models. Make sure to prepare the dataset first and specify the correct data path.

Evaluate Models

Use one of scripts/eval to evaluate your models. Make sure to specify the correct model path.

Acknowledgments

  • We would like to thank Qwen for releasing super-awesome Qwen-2.5 Models, and
  • thanks cmu-l3 (L1) and fzppp (MT-R1-Zero) for codebase! This codebase is built on top of their work.

Citation

If you use Med-U1 in your research, please cite:

@misc{zhang2025medu1incentivizingunifiedmedical,
      title={Med-U1: Incentivizing Unified Medical Reasoning in LLMs via Large-scale Reinforcement Learning}, 
      author={Xiaotian Zhang and Yuan Wang and Zhaopeng Feng and Ruizhe Chen and Zhijie Zhou and Yan Zhang and Hongxia Xu and Jian Wu and Zuozhu Liu},
      year={2025},
      eprint={2506.12307},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2506.12307}, 
}

About

This is the official repository of paper “Med-U1: Incentivizing Unified Medical Reasoning in LLMs via Large-scale Reinforcement Learning”

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors