ReasonLLM: Efficient LLM RL Fine-Tuning with Optimized Resource Utilization 🚀

A cutting-edge framework for efficient GRPO algorithm implementation with VLLM acceleration, enabling large language model fine-tuning with lower GPU memory usage.

🌟 Key Features

⚡ Ultra-Efficient Resource Usage

Lower GPU memory consumption than other methods
Serialized sampling & training pipeline for optimal GPU utilization
Dynamic-Batch processing
Supports Lora fine-tuning

🚀 Accelerated Performance

vllm/lmdeploy-powered sampling acceleration

🧩 Production-Ready Design

Simple directory structure
DeepSpeed Zero-2/3 integration
Seamless HuggingFace ecosystem compatibility

Challenge	Conventional Solutions	Our Approach
Slow Sampling Speed	Transformers processing	VLLM GPU acceleration
High Min Batch Size Per Device	group size	1
Memory Inefficiency/High VRAM Requirements	Dual-model loading(vllm/train)	Single-model loading

🛠️ Getting Started

Prerequisites

NVIDIA GPU
CUDA 12+
Python 3.10+

Installation

git clone https://github.com/loxs123/reason-llm.git
cd reason-llm
pip install vllm # [use vllm backend]
# or pip install lmdeploy # [use lmdeploy backend]
pip install -e . 
# export HF_ENDPOINT=https://hf-mirror.com # if use mirror

Project Structure

├── data
│   └── buffer.json      # Auto-generated training buffer
├── model                # Model directory
│   ├── config.json      # put your model here
│   ├── model.safetensors
│   └── tokenizer...
└── reason_llm            # Core framework
    ├── config.py         # Training configuration
    ├── reward_fn.py      # Reward Functions
    └── ...              # Implementation modules

Launch Training

nohup python -u scripts/train.py &

Training considerations

config_file : `reason_llm/config.py`
config list : `configs/*.py`
In multi-GPU training, do not forget update `num_processes` in `reason_llm/deepspeed_zero3.yaml` to match the number of GPUs.
deepseek : Need to modify `tokenizer_config.json` https://zhuanlan.zhihu.com/p/21465667399
CUDA_VISIBLE_DEVICES=0 accelerate launch --config_file "reason_llm/ds_cfgs/deepspeed_zero2.yaml" reason_llm/trainer.py # for test
CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch --config_file "reason_llm/ds_cfgs/deepspeed_zero3.yaml" reason_llm/trainer.py # for test

Some experiences and tips.

The larger the Lora rank, the better(≥128);
The larger the batch size, the better.
Removing samples with Advantage < 0 can lead to a better result.
Removing samples where reward.std() is too small (<0.1).

📊 Experimental Results

Qwen2.5-7B

Item	detail
Train Base Model	Qwen2.5-7B-Instruct
Train Type	full finetune
Train Hardware	1×A100(80G)
Train Time	12h
Train Dataset	xiaodongguaAIGC/X-R1-7500
Test Dataset	AIME 2024 Dataset
System Setting	`A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think><answer> answer here </answer>`

训练日志 commit id:9de0d1fda962a42a9e6a6b4ed10ddf3f171dea3c

📚 References

@misc{reason-llm,
  author = {Xin Li},
  title = {ReasonLLM: Efficient LLM RL Fine-Tuning with Optimized Resource Utilization},
  year = {2025},
  howpublished = {\url{https://github.com/loxs123/reason-llm}}
}

Empowering efficient LLM fine-tuning for everyone 🤖

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
configs		configs
data		data
images		images
log		log
model		model
reason_llm		reason_llm
scripts		scripts
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ReasonLLM: Efficient LLM RL Fine-Tuning with Optimized Resource Utilization 🚀

🌟 Key Features

🛠️ Getting Started

Prerequisites

Installation

Project Structure

Launch Training

Training considerations

Some experiences and tips.

📊 Experimental Results

Qwen2.5-7B

📚 References

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ReasonLLM: Efficient LLM RL Fine-Tuning with Optimized Resource Utilization 🚀

🌟 Key Features

🛠️ Getting Started

Prerequisites

Installation

Project Structure

Launch Training

Training considerations

Some experiences and tips.

📊 Experimental Results

Qwen2.5-7B

📚 References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages