diff --git a/docs/advanced/rollout_parallel_accuracy.md b/docs/advanced/rollout_parallel_accuracy.md
new file mode 100644
index 0000000000..e83054a6d9
--- /dev/null
+++ b/docs/advanced/rollout_parallel_accuracy.md
@@ -0,0 +1,91 @@
+# Rollout Parallel Accuracy
+
+sglang-diffusion supports several rollout-side parallel strategies. These
+strategies are important for throughput and memory, but they can also change the
+numeric path of the diffusion forward pass and rollout log-prob computation.
+For RL post-training, those differences matter: the trainer consumes rollout
+trajectories, rewards, and log-probs, so a parallel configuration should be
+chosen with a clear understanding of its accuracy behavior.
+
+This document summarizes the currently relevant rollout parallel strategies and
+their observed precision impact.
+
+## Parallel Strategies
+
+| Strategy | Meaning | Typical purpose |
+| --- | --- | --- |
+| SP / Ulysses | Sequence parallelism over latent/image tokens. Each rank handles a shard of the sequence dimension and uses collectives inside attention. | Increase max resolution or reduce per-GPU activation memory. |
+| TP | Tensor parallelism inside the DiT / transformer layers. | Split model compute and parameters across GPUs. |
+| CFGP | Classifier-free-guidance parallelism. Conditional and unconditional branches are computed on different ranks and then combined. | Reduce wall-clock cost of CFG when both branches are required. |
+
+The main tensors to watch are:
+
+| Tensor | Why it matters |
+| --- | --- |
+| `model_output` | Direct output of the DiT denoiser. Differences here affect the denoising trajectory. |
+| `prev_sample_mean` | Scheduler mean update before adding SDE/CPS variance noise. |
+| `variance_noise` | Random noise used by SDE/CPS rollout. |
+| `noise_std_dev` | Scheduler noise scale. |
+| `rollout_log_probs` | Per-step rollout log-prob consumed by RL training. This is the most important rollout-side scalar for policy-gradient correctness. |
+
+## Tested Scope
+
+The rollout-parallel accuracy checks were run on:
+
+| Model | Resolution | Steps | GPUs | Reference |
+| --- | ---: | ---: | ---: | --- |
+| `Qwen/Qwen-Image` | 1024 x 1024 | 50 | 1-2 | diffusers, single GPU, TP1 SP1, no CFGP |
+| `Tongyi-MAI/Z-Image-Turbo` | 1024 x 1024 | 9 | 1-2 | diffusers, single GPU, TP1 SP1, no CFGP |
+
+## Accuracy Summary
+
+| Parallel strategy | `rollout_log_probs` vs single-GPU reference | DiT-side tensors vs single-GPU reference | Practical interpretation |
+| --- | --- | --- | --- |
+| SP / Ulysses | Bit-exact in the tested Qwen-Image and Z-Image-Turbo runs. | Bit-exact in the tested Qwen-Image and Z-Image-Turbo runs. | Safest tested rollout parallel mode for accuracy-sensitive log-prob replay. |
+| TP | Bit-exact in the tested rollout log-prob path. | Qwen-Image was bit-exact in the tested TP2-SP1 run; Z-Image-Turbo showed DiT-side drift in `model_output` / `prev_sample_mean`. | Log-prob can remain exact even when the model forward path has small architecture-dependent reduction-order drift. |
+| CFGP | Bit-exact in the tested rollout log-prob path. | `model_output` / `prev_sample_mean` can drift from the serial CFG reference because cond/uncond branches are combined through CFG-parallel collectives. | Useful for CFG throughput, but do not assume full tensor bit-exactness vs serial CFG. |
+
+## Detailed Results
+
+### SDE Rollout
+
+| Parallel strategy | `variance_noise` | `noise_std_dev` | `rollout_log_probs` | `model_output` / `prev_sample_mean` |
+| --- | --- | --- | --- | --- |
+| SP / Ulysses | 0 max abs diff | 0 max abs diff | 0 max abs diff | 0 max abs diff in tested runs |
+| TP | 0 max abs diff | 0 max abs diff | 0 max abs diff | Model-dependent: exact for tested Qwen-Image; drift observed for tested Z-Image-Turbo |
+| CFGP | 0 max abs diff | 0 max abs diff | 0 max abs diff | Drift observed in CFG-parallel `model_output` / `prev_sample_mean` |
+
+### CPS Rollout
+
+| Parallel strategy | `variance_noise` | `noise_std_dev` | `rollout_log_probs` | `model_output` / `prev_sample_mean` |
+| --- | --- | --- | --- | --- |
+| SP / Ulysses | 0 max abs diff | 0 max abs diff | 0 max abs diff | 0 max abs diff in tested runs |
+| TP | 0 max abs diff | 0 max abs diff | 0 max abs diff | Model-dependent: exact for tested Qwen-Image; drift observed for tested Z-Image-Turbo |
+| CFGP | 0 max abs diff | 0 max abs diff | 0 max abs diff | Drift observed in CFG-parallel `model_output` / `prev_sample_mean` |
+
+### ODE Rollout
+
+| Parallel strategy | `rollout_log_probs` | `model_output` / deterministic update |
+| --- | --- | --- |
+| SP / Ulysses | 0 max abs diff | 0 max abs diff in tested runs |
+| TP | 0 max abs diff | Model-dependent drift can appear in the DiT forward path |
+| CFGP | 0 max abs diff | Drift observed in CFG-parallel `model_output` / `prev_sample_mean` |
+
+ODE has a special precision contract: the rollout branch should preserve
+bit-exactness with the non-rollout deterministic scheduler step. For this
+reason, SGLang keeps the ODE branch dtype-preserving instead of applying the
+same fp32 entry cast used by SDE/CPS.
+
+## Practical Guidance
+
+- Prefer SP / Ulysses when the main goal is scaling rollout resolution while
+  preserving rollout log-prob accuracy. It is the cleanest tested path for
+  bit-exact log-prob replay.
+- Use TP when model memory or compute requires it, but validate DiT-side tensor
+  drift for the specific backbone. The tested Qwen-Image path was bit-exact;
+  the tested Z-Image-Turbo path still showed model-output drift.
+- Use CFGP when CFG throughput matters, but treat it as a numerically different
+  forward path from serial CFG for `model_output` and `prev_sample_mean`.
+  `rollout_log_probs` were still bit-exact in the tested rollout path.
+- For SDE/CPS, expect fp32 rollout log-prob computation. For ODE, preserve the
+  native deterministic scheduler path.
diff --git a/docs/examples/qwen_image_ocr_demo.md b/docs/examples/qwen_image_ocr_demo.md
new file mode 100644
index 0000000000..1214cfe5ae
--- /dev/null
+++ b/docs/examples/qwen_image_ocr_demo.md
@@ -0,0 +1,221 @@
+# Qwen-Image OCR with 2 GPUs
+
+This example runs miles-diffusion with Qwen-Image, FSDP training, LoRA updates,
+the built-in diffusion rollout path, and the OCR reward.
+
+## Environment Setup
+
+First complete the base environment setup in
+[Quick Start](../get_started/quick_start.md).
+
+Then install the OCR task dependencies:
+
+```bash
+conda activate miles-diffusion
+cd /path/to/miles
+```
+
+Follow [Task Dependencies: OCR Dependencies](../get_started/task_dependencies.md#ocr-dependencies).
+The important check is:
+
+```bash
+python -c "from paddleocr import PaddleOCR; from Levenshtein import distance; import miles.rollout.rm_hub.ocr; print('OCR deps OK')"
+```
+
+The example uses 2 NVIDIA GPUs. It downloads the training and evaluation data
+from Hugging Face during startup, so the machine must be able to access Hugging
+Face.
+
+Optionally enable Weights & Biases logging:
+
+```bash
+export WANDB_API_KEY=...
+```
+
+## Run Training
+
+Execute the 2-GPU script:
+
+```bash
+cd /path/to/miles
+conda activate miles-diffusion
+bash scripts/run-diffusion-grpo-ocr-2gpu-flowgrpo-aligned.sh
+```
+
+By default, the script uses:
+
+```bash
+CUDA_VISIBLE_DEVICES=2,3
+```
+
+Override it if your available GPUs are different:
+
+```bash
+CUDA_VISIBLE_DEVICES=0,1 bash scripts/run-diffusion-grpo-ocr-2gpu-flowgrpo-aligned.sh
+```
+
+The script writes checkpoints under:
+
+```bash
+logs/diffusion_grpo_ocr_2gpu_flowgrpo_aligned_<timestamp>/ckpt
+```
+
+## Data and Model
+
+The script downloads the OCR dataset to:
+
+```bash
+/root/datasets/miles-diffusion-datasets
+```
+
+using:
+
+```bash
+hf download --repo-type dataset rockdu/miles-diffusion-datasets \
+  --include "flowgrpo_ocr/**" \
+  --local-dir /root/datasets/miles-diffusion-datasets
+```
+
+Training reads:
+
+```bash
+/root/datasets/miles-diffusion-datasets/flowgrpo_ocr/train.jsonl
+```
+
+Evaluation reads:
+
+```bash
+/root/datasets/miles-diffusion-datasets/flowgrpo_ocr/test.jsonl
+```
+
+The model is loaded from:
+
+```bash
+Qwen/Qwen-Image
+```
+
+## Parameter Introduction
+
+Here, we briefly introduce the main parts of
+`scripts/run-diffusion-grpo-ocr-2gpu-flowgrpo-aligned.sh`.
+
+### Diffusion Rollout
+
+The script uses the diffusion rollout function:
+
+```bash
+--train-backend fsdp
+--rollout-function-path miles.rollout.sglang_diffusion_rollout.generate_rollout
+--hf-checkpoint Qwen/Qwen-Image
+--diffusion-model Qwen/Qwen-Image
+```
+
+### LoRA Training
+
+The example trains LoRA weights instead of full model weights:
+
+```bash
+--use-lora
+--lora-rank 64
+--lora-alpha 128
+--diffusion-init-lora-weight gaussian
+```
+
+### OCR Reward
+
+The reward is configured through both the diffusion reward string and the miles
+reward type:
+
+```bash
+--diffusion-reward ocr:1.0
+--rm-type ocr
+--advantage-estimator grpo
+```
+
+### Colocated Resources
+
+Training and rollout share the same 2 GPUs:
+
+```bash
+--actor-num-gpus-per-node 2
+--rollout-num-gpus 2
+--rollout-num-gpus-per-engine 1
+--num-gpus-per-node 2
+--colocate
+```
+
+## Batch and Step Math
+
+The 2-GPU script is scaled down from the 4-GPU FlowGRPO-aligned OCR recipe.
+Per rollout:
+
+```text
+rollout_batch_size = 16 prompts
+n_samples_per_prompt = 16 samples per prompt
+samples_per_rollout = 16 * 16 = 256 samples
+num_steps_per_rollout = 2 optimizer steps
+global_batch_size = 256 / 2 = 128 samples per optimizer step
+```
+
+With 2 training GPUs, each rank receives:
+
+```text
+128 / 2 = 64 samples per optimizer step
+```
+
+The DiT forward is tiled as:
+
+```bash
+--micro-batch-size-sample 4
+--micro-batch-size-tstep 2
+```
+
+so one forward tile covers `4 * 2 = 8` sample/timestep cells.
+
+For a deeper explanation of these batch-shape parameters, see
+[Batch sizes in miles-diffusion](../developer_guide/batch_sizes_in_miles_d.md).
+
+## Diffusion Sampling Settings
+
+The example mirrors the Qwen-Image OCR FlowGRPO settings:
+
+```bash
+--diffusion-num-steps 10
+--diffusion-eval-num-steps 50
+--diffusion-guidance-scale 4.0
+--diffusion-true-cfg-scale 4.0
+--diffusion-noise-level 1.2
+--diffusion-step-strategy-path miles.rollout.step_strategy_hub.sde_window
+--diffusion-sde-window-size 2
+--diffusion-sde-window-range 3,5
+--diffusion-height 512
+--diffusion-width 512
+```
+
+The active SDE training window has size 2. The `3,5` range selects the same
+effective window used by the aligned FlowGRPO recipe.
+
+## 4-GPU Variant
+
+If you have 4 GPUs, use:
+
+```bash
+CUDA_VISIBLE_DEVICES=0,1,2,3 bash scripts/run-diffusion-grpo-ocr-4gpu-flowgrpo-aligned.sh
+```
+
+The 4-GPU script doubles `--rollout-batch-size` from 16 to 32 while keeping the
+per-rank training load at 64 samples per optimizer step.
+
+## Expected Result
+
+A successful launch should:
+
+1. download `flowgrpo_ocr/**` if it is not already present;
+2. start the colocated FSDP actor and sglang-diffusion rollout engine;
+3. generate Qwen-Image OCR rollouts;
+4. compute OCR rewards;
+5. begin GRPO LoRA updates;
+6. save checkpoints under the run-specific `logs/` directory.
+
+If the run fails before training starts, first check GPU visibility, Hugging Face
+access, the base environment, and the OCR task dependencies.
diff --git a/docs/get_started/quick_start.md b/docs/get_started/quick_start.md
new file mode 100644
index 0000000000..14f183af16
--- /dev/null
+++ b/docs/get_started/quick_start.md
@@ -0,0 +1,201 @@
+# Quick Start
+
+This document describes the recommended base environment for
+`miles-diffusion`. It covers the common runtime needed by the diffusion training
+entrypoint, the pinned sglang-diffusion fork, and the miles package.
+
+Task-specific reward dependencies are intentionally kept out of the base setup.
+After the base environment is ready, install the dependencies required by your
+target recipe from [Task Dependencies](task_dependencies.md).
+
+## Basic Environment Setup
+
+Miles-diffusion depends on a custom sglang-diffusion fork for multimodal rollout
+and RL weight synchronization. The sglang branch can move over time, so the
+environment should pin the exact sglang commit instead of installing from a
+floating branch tip.
+
+Run the following block from the repository root:
+
+```bash
+set -euo pipefail
+
+ENV_NAME="${ENV_NAME:-miles-diffusion}"
+PY_VER="${PY_VER:-3.11}"
+CUDA_VER="${CUDA_VER:-12.9}"
+TORCH_VER="${TORCH_VER:-2.9.1}"
+SGLANG_REPO="${SGLANG_REPO:-https://github.com/Rockdu/sglang.git}"
+SGLANG_BRANCH="${SGLANG_BRANCH:-sglang-diffusion-rollout-test}"
+SGLANG_COMMIT="${SGLANG_COMMIT:-0372158dd66bc7cb0740c733bd60047db790ec7d}"
+
+PIP_VER="${PIP_VER:-26.0.1}"
+WHEEL_VER="${WHEEL_VER:-0.45.1}"
+SETUPTOOLS_VER="${SETUPTOOLS_VER:-82.0.1}"
+TORCH_MEMORY_SAVER_VER="${TORCH_MEMORY_SAVER_VER:-0.0.9}"
+
+REPO_DIR="$(pwd)"
+SGLANG_DIR="${SGLANG_DIR:-$(dirname "$REPO_DIR")/sglang}"
+
+if command -v mamba >/dev/null 2>&1; then
+  CONDA_BIN=mamba
+elif command -v conda >/dev/null 2>&1; then
+  CONDA_BIN=conda
+else
+  echo "conda/mamba not found. Install miniforge first: https://github.com/conda-forge/miniforge" >&2
+  exit 1
+fi
+
+source "$($CONDA_BIN info --base)/etc/profile.d/conda.sh"
+
+if conda env list | awk '{print $1}' | grep -qx "$ENV_NAME"; then
+  echo "[install] conda env '$ENV_NAME' exists; reusing"
+else
+  echo "[install] creating conda env '$ENV_NAME'"
+  "$CONDA_BIN" create -y -n "$ENV_NAME" "python=$PY_VER"
+fi
+conda activate "$ENV_NAME"
+
+python -m pip install "pip==$PIP_VER" "wheel==$WHEEL_VER" "setuptools==$SETUPTOOLS_VER"
+
+CU_TAG="cu$(echo "$CUDA_VER" | tr -d .)"
+if python -c "import torch" 2>/dev/null; then
+  CUR_TORCH="$(python -c 'import torch; print(torch.__version__)')"
+  if [[ "$CUR_TORCH" == "${TORCH_VER}+${CU_TAG}" || "$CUR_TORCH" == "$TORCH_VER" ]]; then
+    echo "[install] torch: $CUR_TORCH"
+  else
+    echo "[install] reinstalling torch==$TORCH_VER from $CU_TAG"
+    pip install --force-reinstall "torch==$TORCH_VER" --index-url "https://download.pytorch.org/whl/$CU_TAG"
+  fi
+else
+  echo "[install] installing torch==$TORCH_VER from $CU_TAG"
+  pip install "torch==$TORCH_VER" --index-url "https://download.pytorch.org/whl/$CU_TAG"
+fi
+
+if [[ ! -d "$SGLANG_DIR" ]]; then
+  echo "[install] cloning $SGLANG_REPO -> $SGLANG_DIR"
+  git clone --branch "$SGLANG_BRANCH" "$SGLANG_REPO" "$SGLANG_DIR"
+fi
+
+pushd "$SGLANG_DIR" >/dev/null
+if ! git remote get-url rockdu >/dev/null 2>&1; then
+  git remote add rockdu "$SGLANG_REPO"
+fi
+if ! git cat-file -e "$SGLANG_COMMIT^{commit}" 2>/dev/null; then
+  git fetch rockdu "$SGLANG_BRANCH"
+fi
+CUR_SGLANG_COMMIT="$(git rev-parse HEAD)"
+if [[ "$CUR_SGLANG_COMMIT" != "$SGLANG_COMMIT" ]]; then
+  git checkout --detach "$SGLANG_COMMIT"
+fi
+pip install -e "python[all]"
+popd >/dev/null
+
+cd "$REPO_DIR"
+pip install -r requirements.txt
+pip install -e . --no-deps
+
+pip install "torch_memory_saver==$TORCH_MEMORY_SAVER_VER" || true
+
+if command -v nvidia-smi >/dev/null 2>&1; then
+  nvidia-smi -L
+else
+  echo "[warn] nvidia-smi not found; GPU visibility was not checked"
+fi
+
+python -c "import train_diffusion; from miles.utils.arguments import parse_args; from miles.backends.fsdp_utils import FSDPTrainRayActor; import sglang.multimodal_gen; print('miles-diffusion import OK')"
+```
+
+The block is idempotent. Re-running it reuses the conda environment, the sglang
+checkout, and already installed packages when they match the configured
+versions.
+
+## What the Setup Creates
+
+By default, the setup creates this layout:
+
+```bash
+/path/to/miles         # this repository
+/path/to/sglang        # Rockdu/sglang checked out at the pinned commit
+```
+
+It performs the following steps:
+
+1. creates or reuses a conda environment named `miles-diffusion`;
+2. installs pinned Python build tooling;
+3. installs pinned PyTorch from the selected CUDA wheel index;
+4. clones `Rockdu/sglang` and checks out the pinned sglang-diffusion commit;
+5. installs sglang in editable mode with `python[all]`;
+6. installs miles dependencies from `requirements.txt`;
+7. installs miles itself in editable mode;
+8. optionally installs `torch_memory_saver`;
+9. runs a Python import smoke test.
+
+Activate the environment after installation:
+
+```bash
+conda activate miles-diffusion
+python -c "import train_diffusion; import sglang.multimodal_gen; print('OK')"
+```
+
+If the import command succeeds, the base environment can load the miles
+diffusion training entrypoint and the sglang multimodal rollout module.
+
+## Version Pins
+
+The base setup keeps the key environment choices explicit:
+
+| Component | Default pin | Override variable |
+| --- | --- | --- |
+| Conda env | `miles-diffusion` | `ENV_NAME` |
+| Python | `3.11` | `PY_VER` |
+| pip | `26.0.1` | `PIP_VER` |
+| wheel | `0.45.1` | `WHEEL_VER` |
+| setuptools | `82.0.1` | `SETUPTOOLS_VER` |
+| PyTorch | `torch==2.9.1` | `TORCH_VER` |
+| CUDA wheel index | `cu129` | `CUDA_VER=12.9` |
+| sglang repo | `https://github.com/Rockdu/sglang.git` | `SGLANG_REPO` |
+| sglang branch | `sglang-diffusion-rollout-test` | `SGLANG_BRANCH` |
+| sglang commit | `0372158dd66bc7cb0740c733bd60047db790ec7d` | `SGLANG_COMMIT` |
+| torch_memory_saver | `0.0.9` | `TORCH_MEMORY_SAVER_VER` |
+
+Miles package dependencies are pinned in `requirements.txt`, including:
+
+```text
+accelerate==1.12.0
+datasets==4.4.2
+pillow==11.3.0
+ray[default]==2.53.0
+sglang-router==0.3.0
+transformers==5.5.4
+wandb==0.23.1
+```
+
+The sglang source revision is pinned by commit SHA. This is important because
+miles-diffusion relies on the sglang-diffusion fork for multimodal rollout and
+weight synchronization; installing from only the branch name is not reproducible
+enough for debugging or sharing results.
+
+## Configurable Setup
+
+You can override the defaults before running the setup block:
+
+```bash
+export ENV_NAME=miles-diffusion
+export PY_VER=3.11
+export CUDA_VER=12.9
+export TORCH_VER=2.9.1
+export SGLANG_DIR=/path/to/sglang
+export SGLANG_REPO=https://github.com/Rockdu/sglang.git
+export SGLANG_BRANCH=sglang-diffusion-rollout-test
+export SGLANG_COMMIT=0372158dd66bc7cb0740c733bd60047db790ec7d
+```
+
+Only override `SGLANG_COMMIT` when intentionally testing a new
+sglang-diffusion revision.
+
+## Task Dependencies
+
+The base setup intentionally does not install task-specific reward dependencies.
+Before running a recipe, install the dependency set required by that task:
+
+- [Task Dependencies](task_dependencies.md)
diff --git a/docs/get_started/task_dependencies.md b/docs/get_started/task_dependencies.md
new file mode 100644
index 0000000000..01794c9475
--- /dev/null
+++ b/docs/get_started/task_dependencies.md
@@ -0,0 +1,70 @@
+# Task Dependencies
+
+The quick start installs the common miles-diffusion runtime. Some tasks require
+extra reward or evaluation dependencies. This page records those task-scoped
+dependency sets only.
+
+## OCR Dependencies
+
+Use this section for any task that enables the OCR reward, for example through
+`--rm-type ocr` or `--diffusion-reward ocr:...`.
+
+The dependency boundary is the OCR reward implementation:
+
+```text
+miles.rollout.rm_hub.ocr
+```
+
+The OCR reward depends on PaddleOCR, PaddlePaddle, OpenCV runtime libraries, and
+string-distance packages.
+
+Start from the base environment:
+
+```bash
+conda activate miles-diffusion
+cd /path/to/miles
+```
+
+Install the system libraries required by PaddleOCR and OpenCV:
+
+```bash
+sudo apt-get update
+sudo apt-get install -y libglib2.0-0 libgl1
+```
+
+In a root container, use `apt-get` directly if `sudo` is unavailable:
+
+```bash
+apt-get update
+apt-get install -y libglib2.0-0 libgl1
+```
+
+Install the Python dependencies from the pinned `flow_grpo` setup file:
+
+```bash
+cd /path/to/miles/flow_grpo
+grep -v '^apt-get install ' setup.sh | bash
+```
+
+The relevant pins include:
+
+```text
+diffusers==0.37.0
+peft==0.18.1
+bitsandbytes==0.48.0
+opencv-python==4.11.0.86
+opencv-python-headless==4.10.0.84
+opencv-contrib-python==4.11.0.86
+paddlepaddle-gpu==2.6.2
+paddleocr==2.9.1
+python-Levenshtein==0.27.3
+levenshtein==0.27.3
+rapidfuzz==3.14.3
+```
+
+Verify the OCR reward stack:
+
+```bash
+cd /path/to/miles
+python -c "from paddleocr import PaddleOCR; from Levenshtein import distance; import miles.rollout.rm_hub.ocr; print('OCR deps OK')"
+```
diff --git a/docs/roadmap.md b/docs/roadmap.md
new file mode 100644
index 0000000000..fb4edb0e91
--- /dev/null
+++ b/docs/roadmap.md
@@ -0,0 +1,56 @@
+# Roadmap
+
+This roadmap describes planned directions for miles-diffusion. It is intended
+for external developers and users, and does not represent a strict release
+commitment. Priorities may change as the project evolves.
+
+## Text-to-Video RL
+
+We plan to extend miles-diffusion from text-to-image RL to text-to-video RL. The
+overall training loop is similar to T2I, but video rollout has higher memory,
+compute, and IO pressure.
+
+An initial direction is to start with short videos or few-frame settings, then
+extend to longer video generation as the runtime and training recipes mature.
+This also makes it possible to study whether policies trained on short temporal
+contexts can transfer to longer video generation settings.
+
+## Image Editing and TI2I
+
+We plan to support text-guided image-to-image and image editing workflows. This
+includes edit models, condition images, negative images, and other multimodal
+conditioning inputs.
+
+Compared with pure T2I, TI2I introduces additional conditioning paths and
+model-specific preprocessing. Supporting these workflows will require extending
+the diffusion rollout interface and training recipes while keeping the user
+experience close to the existing T2I path.
+
+## More Diffusion Backbones
+
+We plan to expand support for more mainstream diffusion and DiT backbones. New
+model support should include rollout compatibility, scheduler support, and
+log-prob validation against a trusted reference path.
+
+For supported recipes, we aim to keep rollout log-prob drift within an
+acceptable tolerance when compared with the reference implementation. The exact
+tolerance may depend on the model, scheduler, dtype, and rollout mode.
+
+## Mixed Resolution Training
+
+We plan to explore mixed resolution training and rollout support. Mixed
+resolution can make data construction more flexible and may improve hardware
+utilization for workloads that naturally contain prompts or tasks at different
+image sizes.
+
+This direction may include NaViT-style batching, resolution-aware sampling, and
+batch construction strategies that work across T2I, TI2I, and T2V workloads.
+
+## More Tasks and Rewards
+
+We plan to add more task and reward coverage for diffusion RL. The current
+direction includes OCR-style rewards, preference or aesthetic rewards, and
+custom reward functions provided by users.
+
+The goal is to make it straightforward to attach new reward signals without
+rewriting the core rollout and training loop.