Skip to content

[rl] Fix batch invariant logprob calculation by forcing vllm use trainer's function#3629

Open
wwwjn wants to merge 2 commits into
mainfrom
fix-batch-inv
Open

[rl] Fix batch invariant logprob calculation by forcing vllm use trainer's function#3629
wwwjn wants to merge 2 commits into
mainfrom
fix-batch-inv

Conversation

@wwwjn

@wwwjn wwwjn commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Rational: The logp calculation should be more transparent to us

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 11, 2026
@wwwjn wwwjn requested a review from liangel-02 June 11, 2026 03:31
Comment on lines +120 to +121
# Broadcast 2D logits over token_ids' num_logprobs columns so the
# trainer's compute_logprobs (one token per position) works unchanged.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

say more. I'm confused by when we need to expand anything. The docstring of https://github.com/pytorch/torchtitan/blob/main/torchtitan/experiments/rl/actors/trainer.py#L45 is not great, either. But at least I could see the the BSV stands for bsz, seqlen, vocab size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/rl ciflow/8gpu CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants