128× in four months.
Diffusion LLMs went from ~64 denoising steps to 1 step between Nov 2024 and Mar 2025. The compression curve is not a marketing chart — it's four papers, each replacing the previous bottleneck. Drag along the timeline to feel how fast the floor moved.
Each milestone attacks denoising from a different angle. CDLM (Together AI, Jan 2025) keeps the architecture and distills a consistency loss into a post-trained student — quality stays, latency drops 14.5× on code. IDLM (Li et al., Feb 2025) replaces iterative denoising with a learned implicit prior that lets a single forward pass land near the data manifold. FS-DFM (Chen et al., Mar 2025) straightens the discrete flow with optimal transport, so one Euler step produces samples that previously needed 64.
These aren't variations of the same trick. Distillation, implicit priors, and flow-straightening are independent levers. Stacking them is plausible — and that's what makes the curve visceral rather than coincidental.
The other axis: Dream 7B (HKU NLP) hits 81% on 9×9 Sudoku where an autoregressive 7B baseline scores 21%. Not because diffusion is "smarter" — because bidirectional attention lets the model reason about constraints across the full grid simultaneously. AR models commit cell-by-cell; diffusion can revise. On ARC-AGI and Countdown, the same effect appears in smaller form.
Hover the planning stats panel and the gap between AR and diffusion stops looking like noise. Constraint-satisfaction tasks reward the architecture that can re-mask its own draft.
Sahoo et al., "Simple and Effective Masked Diffusion Language Models" (NeurIPS 2024) — the MDLM baseline.
Together AI, "Consistency Diffusion Language Models" (Jan 2025) — CDLM, 14.5×.
Li et al., "Implicit Discrete Language Models" (Feb 2025) — IDLM, 64×.
Chen et al., "Flow-Straight Discrete Flow Matching" (Mar 2025) — FS-DFM, 128×.
Ye et al., "Dream 7B" (HKU NLP, 2025) — Sudoku 81% vs AR baseline 21%.