length penalty in reward function by paulinebourigault · Pull Request #14 · project-numina/verl

paulinebourigault · 2025-04-25T17:34:18Z

The length penalty is calculated using the formula: ((5 + length) / 6) ^ alpha

This is based on the formula used in Google Neural Machine Translation paper [https://arxiv.org/pdf/1609.08144]
When alpha < 0, shorter sequences get higher rewards
When alpha > 0, longer sequences get higher rewards

Length penalties are applied to the final reward scores, after the primary reward calculation

paulinebourigault · 2025-04-25T17:38:41Z

example use in training script:
reward_model.length_penalty.enabled=True \
reward_model.length_penalty.alpha=-0.15 \
reward_model.length_penalty.min_length=3000 \
reward_model.length_penalty.max_length=4000 \

length penaly in reward function

bb573bc

paulinebourigault changed the title ~~length penaly in reward function~~ length penalty in reward function Apr 25, 2025

paulinebourigault and others added 3 commits April 28, 2025 19:11

added length penalty option

833ac3e

fix conflict

4a40db1

Merge branch 'main' into pauline/length_penalty_in_reward

a74b6cc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

length penalty in reward function#14

length penalty in reward function#14
paulinebourigault wants to merge 4 commits into
mainfrom
pauline/length_penalty_in_reward

paulinebourigault commented Apr 25, 2025

Uh oh!

paulinebourigault commented Apr 25, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

paulinebourigault commented Apr 25, 2025

Uh oh!

paulinebourigault commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

paulinebourigault commented Apr 25, 2025 •

edited

Loading