feat: add RLSD (Self-Distilled RLVR) credit assignment by morgendave · Pull Request #333 · fw-ai/cookbook

morgendave · 2026-04-14T22:48:56Z

Adds composable per-token credit weighting from self-distillation evidence ratios (arXiv 2604.03128). RLSD modulates the per-token advantage magnitude using the teacher-student log-prob ratio:

w_t = clip(exp(sign(A) · (log P_teacher - log P_student)), 1±ε_w)

Key design: RLSD is a weight that composes with any existing loss (GRPO, DAPO, CISPO, etc.) — just like TIS. No new loss function.

Files:

training/utils/rl/rlsd.py: RLSDConfig + compute_rlsd_weights()
training/utils/rl/common.py: SampleContext.rlsd_weight + run_loss_loop plumbing
training/utils/rl/grpo.py: multiply rlsd_weight into surrogate loss
training/utils/rl/init.py: export RLSDConfig

Adds composable per-token credit weighting from self-distillation evidence ratios (arXiv 2604.03128). RLSD modulates the per-token advantage magnitude using the teacher-student log-prob ratio: w_t = clip(exp(sign(A) · (log P_teacher - log P_student)), 1±ε_w) Key design: RLSD is a weight that composes with any existing loss (GRPO, DAPO, CISPO, etc.) — just like TIS. No new loss function. Files: - training/utils/rl/rlsd.py: RLSDConfig + compute_rlsd_weights() - training/utils/rl/common.py: SampleContext.rlsd_weight + run_loss_loop plumbing - training/utils/rl/grpo.py: multiply rlsd_weight into surrogate loss - training/utils/rl/__init__.py: export RLSDConfig

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add RLSD (Self-Distilled RLVR) credit assignment#333

feat: add RLSD (Self-Distilled RLVR) credit assignment#333
morgendave wants to merge 1 commit intomainfrom
cursor/rlsd-credit-assignment-19da

morgendave commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

morgendave commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant