Skip to content

qwen3.6-27b-mtp no gains when MTP enabled #581

@gggiiia

Description

@gggiiia

MTP Appears Non-Functional on Qwen3.6-27B-MTP (Performance Drops Instead of Improving)

Environment

  • Model: Qwen3.6-27B-MTP-GGUF
  • OS: Fedora 44
  • Runtime: Vulkan (llama.cpp v2.20.1)
  • GPU: AMD Radeon RX 7900 XTX

Issue

Enabling MTP for Qwen3.6-27B-MTP-GGUF does not appear to provide any multi-token prediction benefit.

Instead of increasing throughput, generation speed drops significantly, suggesting that MTP may not actually be functioning while still incurring additional overhead.

Results

Configuration Generation Speed
MTP Disabled ~13 tokens/s
MTP Enabled ~13 tokens/s

Expected Behavior

When MTP is working correctly, generation speed should increase due to successful multi-token predictions.

For comparison, on the same system, Qwen3.6-35B-A3B-MTP-GGUF behaves as expected:

Model MTP Disabled MTP Enabled
Qwen3.6-35B-A3B-MTP-GGUF ~50 tokens/s ~110 tokens/s

Why This Looks Like an MTP Issue

The runtime and hardware are clearly capable of benefiting from MTP, as demonstrated by the 35B A3B model.

With Qwen3.6-27B-MTP-GGUF, enabling MTP appears to:

  • Provide no observable multi-token prediction speedup.
  • Reduce throughput by roughly 60%.
  • Behave as though MTP overhead is present, but the MTP predictions themselves are not contributing to generation.

Reproduction

  1. Load Qwen3.6-27B-MTP-GGUF.
  2. Enable MTP in Advanced Settings.
  3. Generate text and measure throughput.
  4. Disable MTP and repeat.
  5. Observe that throughput decreases from ~35 t/s to ~14 t/s when MTP is enabled.

Question

Is MTP currently expected to work with Qwen3.6-27B-MTP-GGUF under Vulkan and llama.cpp v2.20.1, or are there known limitations/issues affecting this model?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions