Bug Description
When running benchmark on Qwen3.6-35B-A3B-4bit, the token generation (tg) phase
is completely skipped for pp1024/tg128 test — tg TPS=0.0 and TPOT=0.00ms.
The pp4096/tg128 test works correctly.
Environment
- Hardware: MacBook Air M5 32GB
- Model:
Qwen3.6-35B-A3B-4bit
- Config:
- ctx_window: 65536
- enable_thinking: false
- TurboQuant KV Cache 4bit
- SpecPrefill: Qwen3.5-2B-OptiQ-4bit
- DFlash: Qwen3.5-9B-DFlash
Benchmark Results
Run 1:
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s)
pp1024/tg128 6905.7 0.00 148.3 tok/s 0.0 6.906 ← tg skipped
pp4096/tg128 24207.7 38.39 169.2 tok/s 26.3 29.083 ← normal
Run 2 (reproduced):
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s)
pp1024/tg128 8439.9 0.00 121.3 tok/s 0.0 8.440 ← tg skipped
pp4096/tg128 23722.6 37.00 172.7 tok/s 27.2 28.422 ← normal
Analysis
E2E ≈ TTFT in pp1024 case, confirming tg phase never executes
- Replacing SpecPrefill draft model has no effect — not a draft model issue
- pp4096 works normally, suggesting a short-context boundary condition
specific to this MoE model (Qwen3.6-35B-A3B)
Expected Behavior
pp1024/tg128 should generate 128 tokens after prefill, same as pp4096/tg128.
Bug Description
When running benchmark on
Qwen3.6-35B-A3B-4bit, the token generation (tg) phaseis completely skipped for
pp1024/tg128test —tg TPS=0.0andTPOT=0.00ms.The
pp4096/tg128test works correctly.Environment
Qwen3.6-35B-A3B-4bitBenchmark Results
Run 1:
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s)
pp1024/tg128 6905.7 0.00 148.3 tok/s 0.0 6.906 ← tg skipped
pp4096/tg128 24207.7 38.39 169.2 tok/s 26.3 29.083 ← normal
Run 2 (reproduced):
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s)
pp1024/tg128 8439.9 0.00 121.3 tok/s 0.0 8.440 ← tg skipped
pp4096/tg128 23722.6 37.00 172.7 tok/s 27.2 28.422 ← normal
Analysis
E2E ≈ TTFTin pp1024 case, confirming tg phase never executesspecific to this MoE model (Qwen3.6-35B-A3B)
Expected Behavior
pp1024/tg128should generate 128 tokens after prefill, same aspp4096/tg128.