Skip to content

Bug: Qwen3.6-35B-A3B-4bit tg TPS=0 with pp1024/tg128 benchmark (MoE short-context issue) #979

@jasen215

Description

@jasen215

Bug Description

When running benchmark on Qwen3.6-35B-A3B-4bit, the token generation (tg) phase
is completely skipped for pp1024/tg128 test — tg TPS=0.0 and TPOT=0.00ms.
The pp4096/tg128 test works correctly.

Environment

  • Hardware: MacBook Air M5 32GB
  • Model: Qwen3.6-35B-A3B-4bit
  • Config:
    • ctx_window: 65536
    • enable_thinking: false
    • TurboQuant KV Cache 4bit
    • SpecPrefill: Qwen3.5-2B-OptiQ-4bit
    • DFlash: Qwen3.5-9B-DFlash

Benchmark Results

Run 1:
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s)
pp1024/tg128 6905.7 0.00 148.3 tok/s 0.0 6.906 ← tg skipped
pp4096/tg128 24207.7 38.39 169.2 tok/s 26.3 29.083 ← normal

Run 2 (reproduced):
Test TTFT(ms) TPOT(ms) pp TPS tg TPS E2E(s)
pp1024/tg128 8439.9 0.00 121.3 tok/s 0.0 8.440 ← tg skipped
pp4096/tg128 23722.6 37.00 172.7 tok/s 27.2 28.422 ← normal

Analysis

  • E2E ≈ TTFT in pp1024 case, confirming tg phase never executes
  • Replacing SpecPrefill draft model has no effect — not a draft model issue
  • pp4096 works normally, suggesting a short-context boundary condition
    specific to this MoE model (Qwen3.6-35B-A3B)

Expected Behavior

pp1024/tg128 should generate 128 tokens after prefill, same as pp4096/tg128.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions