Diffusion Draft Models (DFlash) #752

sammcj · 2026-04-14T10:57:52Z

sammcj
Apr 14, 2026

A rather neat approach of using a diffusion model as a draft to dramatically speed up inference is doing the rounds.

Qwen 3.5 DFlash (draft) models:

Testing out Qwen 3.5 27B 4-Bit, with DFlash draft on M5 Max: 1904 tokens | 40.0 tok/s | 82.6% acceptance

Looks significantly more promising than #500

Wondering how viable implementing diffusion draft models in oMLX might be?

sammcj · 2026-04-14T14:09:58Z

And I just saw 28fab9f 😂 well done!

0 replies