Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
99e9ab7
feat(engine): add supports_chunked_prefill opt-in flag on NodeSubmodule
rohansanda May 1, 2026
606604e
feat(engine): add ARNodeInputs token-axis slicing for chunked prefill
rohansanda May 1, 2026
4f33270
feat(engine): add pure chunk planner for chunked prefill
rohansanda May 1, 2026
ba98564
feat(engine): add execute_chunked_prefill orchestrator
rohansanda May 1, 2026
769ee36
feat(engine): wire max_prefill_chunk_size config + _should_chunk_pref…
rohansanda May 1, 2026
d910e32
feat(engine): fork execute_batch through chunked-prefill orchestrator…
rohansanda May 1, 2026
b5fb7bc
feat(qwen3_omni): opt Thinker into chunked prefill
rohansanda May 1, 2026
f2da508
test(engine): chunked prefill numerical equivalence vs non-chunked, q…
rohansanda May 1, 2026
6771bf4
test(engine): chunked prefill edge cases + multimodal-walk gating note
rohansanda May 1, 2026
38e87f7
fix(engine): plumb explicit prefill/decode mode through plan_attention
rohansanda May 1, 2026
49f30fb
test(engine): relax tolerance for chunked-prefill 1-token-last-chunk …
rohansanda May 1, 2026
3f5be12
feat(engine): NVTX annotations on chunked-prefill orchestrator
rohansanda May 1, 2026
ca23174
feat(config): enable chunked prefill in qwen3_omni config + add TTFT …
rohansanda May 1, 2026
8137a46
feat(conductor): add per-request prefill progress to CurrentForwardPa…
rohansanda May 1, 2026
7c416f2
feat(scheduler): add pure plan_chunked_step for mixed-batch packing
rohansanda May 1, 2026
afc6e11
feat(engine): add scheduler_owns_chunking flag + is_terminal_per_requ…
rohansanda May 1, 2026
63e5453
refactor(scheduler): inline chunked-prefill packing into micro_schedu…
rohansanda May 1, 2026
f8818cc
feat(qwen3_omni): add thinker_step walk for mixed prefill+decode batches
rohansanda May 2, 2026
2d86264
feat(scheduler): hook plan_chunked_step into MicroScheduler.get_next_…
rohansanda May 2, 2026
987667a
feat(scheduler): per-step prompt slicing + mixed-batch correctness test
rohansanda May 2, 2026
5d63916
perf(scheduler): Phase 2 mixed-batch experimental validation harness
rohansanda May 2, 2026
8e44542
feat(config): surface Phase 2 scheduler knobs in qwen3_omni YAML
rohansanda May 2, 2026
0c0a650
refactor(qwen3_omni): thinker_step emits __batched_logits__ for fixed…
rohansanda May 2, 2026
80ac600
feat(engine): consult is_terminal_per_request in batched-logits sampl…
rohansanda May 2, 2026
04a5fbb
feat(qwen3_omni): enable CUDA graph replay for thinker_step
rohansanda May 2, 2026
5b90ca6
test(engine): thinker_step CUDA graph replay numerical equivalence vs…
rohansanda May 2, 2026
467c05f
perf(scheduler): Phase 2.1a 3-way comparison harness (assertion FAILS…
rohansanda May 2, 2026
b50b16d
fix(engine): can_use_cuda_graphs honors replay_graph_walks; runner ga…
rohansanda May 2, 2026
f3b6ed3
feat(qwen3_omni): capture bs=8 for prefill_text/thinker_step graphs
rohansanda May 2, 2026
c9865ad
feat(qwen3_omni): thinker_step accepts audio/vision prefill rids in m…
rohansanda May 2, 2026
58d493a
review(I1,I4): ScheduledBatch uses field(default_factory=dict); delet…
rohansanda May 2, 2026
04cb217
review(I3): replace dynamic axis-detection in slicing helpers with ex…
rohansanda May 2, 2026
0c3218d
review(M3,M5): trim _prepare_text_input docstring; pop __batched_logi…
rohansanda May 2, 2026
cabee24
review(M7): add prefill_audio captured-vs-eager equivalence test
rohansanda May 2, 2026
1da09eb
fix(engine): Phase 1 chunked prefill only fires for prefill_text walk
rohansanda May 2, 2026
f8c2794
feat(scheduler): Phase 2.1b end-to-end — atomic audio/vision rids in …
rohansanda May 2, 2026
ff20fe1
review: simplify chunked-prefill PR — slim defenses, consolidate modu…
rohansanda May 2, 2026
60a1fb8
chore: fix ruff CI errors (W291 trailing whitespace, I001 import sort)
rohansanda May 2, 2026
4f6843f
chore: fix ruff errors surfaced by rebase onto main
rohansanda May 2, 2026
31379ac
fix(worker): route Phase 2 thinker_step outputs to actual worker_graph
rohansanda May 2, 2026
c7471d3
perf(scheduler): route pure-decode chunked batches to thinker_decode …
rohansanda May 2, 2026
c65f80c
chore: drop chunked-prefill test files from PR
rohansanda May 2, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions configs/qwen3omni.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
model: "qwen3_omni"
max_seq_len: 32768
# Engine: chunked prefill. Splits long prefills into 512-token chunks.
# Set to null (or remove) to disable. Only applies to qwen3_omni Thinker
# (the LLM submodule) — other submodules opt in individually.
max_prefill_chunk_size: 512
# Phase 2: scheduler-driven chunked prefill. When true, the MicroScheduler
# packs mixed batches (decodes + prefill chunks across requests) up to
# max_step_tokens. When false (default), the engine handles single-request
# chunking internally (Phase 1).
scheduler_owns_chunking: false
max_step_tokens: 2048
node_groups:
- node_names: [audio_encoder, vision_encoder, Code2Wav]
ranks: [0]
Expand Down
12 changes: 12 additions & 0 deletions mminf/conductor/request_info.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,18 @@ class CurrentForwardPassInfo:
loop_stop_times: dict[str, IterIndexTree] = field(default_factory=dict)
dynamic_loop_iter_counts: dict[str, int] = field(default_factory=dict)

# chunked prefill progress.
# Set at request admission; advanced by the MicroScheduler each step
# as chunks complete. Derived `is_prefill_complete` gates the
# prefill→decode transition. Default values (0, 0) mean a request not
# in chunked-prefill mode.
prefill_tokens_total: int = 0
prefill_tokens_consumed: int = 0

@property
def is_prefill_complete(self) -> bool:
return self.prefill_tokens_consumed >= self.prefill_tokens_total

def register_loop_stop(self, loop_name: str):
self.dynamic_loop_stop_signals.add(loop_name)

Expand Down
Loading
Loading