[rl] Fix batch invariant logprob calculation by forcing vllm use trainer's function by wwwjn · Pull Request #3629 · pytorch/torchtitan

wwwjn · 2026-06-11T03:30:30Z

Rational: The logp calculation should be more transparent to us

tianyu-l · 2026-06-11T04:26:12Z

+        # Broadcast 2D logits over token_ids' num_logprobs columns so the
+        # trainer's compute_logprobs (one token per position) works unchanged.


say more. I'm confused by when we need to expand anything. The docstring of https://github.com/pytorch/torchtitan/blob/main/torchtitan/experiments/rl/actors/trainer.py#L45 is not great, either. But at least I could see the the BSV stands for bsz, seqlen, vocab size.

wwwjn added 2 commits June 10, 2026 20:12

patch vllm to use pytorch to compute logp

ccdc11a

share the same code path

df4b22f

pytorch-bot Bot added ciflow/8gpu ciflow/rl labels Jun 11, 2026

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 11, 2026

wwwjn requested a review from liangel-02 June 11, 2026 03:31

tianyu-l reviewed Jun 11, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rl] Fix batch invariant logprob calculation by forcing vllm use trainer's function#3629

[rl] Fix batch invariant logprob calculation by forcing vllm use trainer's function#3629
wwwjn wants to merge 2 commits into
mainfrom
fix-batch-inv

wwwjn commented Jun 11, 2026

Uh oh!

tianyu-l Jun 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		# Broadcast 2D logits over token_ids' num_logprobs columns so the
		# trainer's compute_logprobs (one token per position) works unchanged.

Conversation

wwwjn commented Jun 11, 2026

Uh oh!

tianyu-l Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants