docs(diffusion RL v0.1): add single-prompt multi-generation guide#25
docs(diffusion RL v0.1): add single-prompt multi-generation guide#25MikukuOvO wants to merge 2 commits into
Conversation
|
|
||
| - Algorithmically, Miles Diffusion compares multiple sampled outputs within a | ||
| prompt group, so each prompt should have multiple sampled outputs. | ||
| - System-wise, packing multiple samples from the same prompt into one SGLang-D |
There was a problem hiding this comment.
Maybe we should highlight the most critical info here. For example, how comes the optimization? (bc of compute-bound after batching & encoder embedding reuse)
There was a problem hiding this comment.
Thanks for the pointer. I checked the current SGLang-D path and updated the doc to explain the optimization more concretely: the prompt is text-encoded once, Qwen-Image conditioning is expanded in the fixed path, and denoising runs as batched DiT forwards across the timestep loop.
| same prompt in one rollout group. In Miles Diffusion, this is both an algorithmic | ||
| requirement and a performance knob: | ||
|
|
||
| - Algorithmically, Miles Diffusion compares multiple sampled outputs within a |
There was a problem hiding this comment.
More specifically, GRPO style of RL requires multiple samples out of a single prompt
There was a problem hiding this comment.
Thanks, addressed this by explicitly stating that GRPO-style RL needs multiple samples from the same prompt to compute group-relative advantages.
Summary
This draft PR adds documentation for the diffusion RL v0.1 branch:
n_samples_per_prompt,diffusion_microgroup_size, and SGLang-Dnum_outputs_per_prompt.Validation
git diff --check rockdu/diffusion_RL_v0.1...rockdu/docs/diffusion_RL_v0.1_fengliny.docs/advanced/single_prompt_multi_generation.mdonly.Rockdu:docs/diffusion_RL_v0.1_fenglinyintodiffusion_RL_v0.1before opening this PR.Notes
This is intentionally opened as a draft so the same PR can continue to receive follow-up documentation updates without creating duplicate PRs.