docs(diffusion RL v0.1): add single-prompt multi-generation guide by MikukuOvO · Pull Request #25 · Rockdu/miles

MikukuOvO · 2026-05-07T00:10:07Z

Summary

This draft PR adds documentation for the diffusion RL v0.1 branch:

Adds an advanced guide for single-prompt multi-generation in Miles Diffusion.
Explains the relationship between prompt groups, n_samples_per_prompt, diffusion_microgroup_size, and SGLang-D num_outputs_per_prompt.
Documents current OCR and PickScore recipe settings, the SGLang-D dependency, known limitations, validation checklist, and practical guidance.

Validation

Ran git diff --check rockdu/diffusion_RL_v0.1...rockdu/docs/diffusion_RL_v0.1_fengliny.
Confirmed the PR-style diff adds docs/advanced/single_prompt_multi_generation.md only.
Confirmed there was no existing open PR from Rockdu:docs/diffusion_RL_v0.1_fengliny into diffusion_RL_v0.1 before opening this PR.

Notes

This is intentionally opened as a draft so the same PR can continue to receive follow-up documentation updates without creating duplicate PRs.

Rockdu · 2026-05-09T03:45:04Z

+
+- Algorithmically, Miles Diffusion compares multiple sampled outputs within a
+  prompt group, so each prompt should have multiple sampled outputs.
+- System-wise, packing multiple samples from the same prompt into one SGLang-D


Maybe we should highlight the most critical info here. For example, how comes the optimization? (bc of compute-bound after batching & encoder embedding reuse)

Thanks for the pointer. I checked the current SGLang-D path and updated the doc to explain the optimization more concretely: the prompt is text-encoded once, Qwen-Image conditioning is expanded in the fixed path, and denoising runs as batched DiT forwards across the timestep loop.

Rockdu · 2026-05-09T03:45:38Z

+same prompt in one rollout group. In Miles Diffusion, this is both an algorithmic
+requirement and a performance knob:
+
+- Algorithmically, Miles Diffusion compares multiple sampled outputs within a


More specifically, GRPO style of RL requires multiple samples out of a single prompt

Thanks, addressed this by explicitly stating that GRPO-style RL needs multiple samples from the same prompt to compute group-relative advantages.

docs: add single-prompt multi-generation guide

023de16

Rockdu reviewed May 9, 2026

View reviewed changes

docs: clarify single-prompt multi-generation

8fe9de3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(diffusion RL v0.1): add single-prompt multi-generation guide#25

docs(diffusion RL v0.1): add single-prompt multi-generation guide#25
MikukuOvO wants to merge 2 commits into
diffusion_RL_v0.1from
docs/diffusion_RL_v0.1_fengliny

MikukuOvO commented May 7, 2026

Uh oh!

Rockdu May 9, 2026

Uh oh!

MikukuOvO May 9, 2026

Uh oh!

Rockdu May 9, 2026

Uh oh!

MikukuOvO May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MikukuOvO commented May 7, 2026

Summary

Validation

Notes

Uh oh!

Rockdu May 9, 2026

Choose a reason for hiding this comment

Uh oh!

MikukuOvO May 9, 2026

Choose a reason for hiding this comment

Uh oh!

Rockdu May 9, 2026

Choose a reason for hiding this comment

Uh oh!

MikukuOvO May 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants