GRPO for Countdown Math Problems

This project implements the Group Relative Policy Optimization (GRPO) algorithm to fine-tune a language model on the "Countdown" math task. The goal is to train an LLM to generate correct mathematical equations that reach a target number using a given set of integers.

Setup

pip install --upgrade uv
uv venv
source .venv/bin/activate
uv pip install vllm==0.7.2 triton==3.1.0 datasets transformers==4.51.3 tensorboard torch gpustat datasets python-dotenv
uv pip install flash-attn==2.7.4.post1 --no-build-isolation

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
output/eval_reports/20251025_032944		output/eval_reports/20251025_032944
results/computational_efficiency_evaluation		results/computational_efficiency_evaluation
.gitignore		.gitignore
GRPO.py		GRPO.py
GRPO_computational_efficiency.py		GRPO_computational_efficiency.py
README.md		README.md
README_computational_efficiency.md		README_computational_efficiency.md
analyze_computational_efficiency.py		analyze_computational_efficiency.py
eval_report.py		eval_report.py
gstar_assignment2.pdf		gstar_assignment2.pdf
response_length_analysis.png		response_length_analysis.png
response_length_analysis.py		response_length_analysis.py
response_length_comparison.py		response_length_comparison.py
response_length_histograms.py		response_length_histograms.py
run_computational_efficiency.py		run_computational_efficiency.py
run_training_persistent.py		run_training_persistent.py
test_computational_efficiency.py		test_computational_efficiency.py
zero_shot_eval.py		zero_shot_eval.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GRPO for Countdown Math Problems

Setup

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GRPO for Countdown Math Problems

Setup

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages