multi-token-prediction

Star

Here are 9 public repositories matching this topic...

czg1225 / DMax

Star

DMax: Aggressive Parallel Decoding for dLLMs

acceleration efficiency large-language-models parallel-decoding diffusion-language-models multi-token-prediction

Updated May 8, 2026
Python

AtomicBot-ai / atomic-llama-cpp-turboquant

Star

llama.cpp fork with TurboQuant WHT-rotated KV cache & weight compression + Gemma 4 MTP speculative decoding for ~30-50% throughput gains

Updated May 8, 2026
C++

JaydenTeoh / beyond-next-token-prediction

Star

Curated collection of research on the limitations of next-token prediction and methods that go beyond it.

machine-learning transformers sequence-to-sequence language-model next-token-prediction multi-token-prediction

Updated May 9, 2026

Indras-Mirror / llama.cpp-mtp

Star

Fused TBQ4 Flash Attention + MTP + Shared Tensors for llama.cpp — 82+ tok/s with lossless 4.25 bpv KV cache at 200K context on RTX 4090

cuda quantization mtp kv-cache fwht llama-cpp flash-attention qwen speculative-decoding rtx-4090 multi-token-prediction turboquant tbq4 tensor-sharing

Updated May 8, 2026
C++

gbyuvd / ChemMiniQ3-HoriFIE

Star

A lightweight experimental generative model for chemistry, with mini Qwen2-like architecture and horizon loss and biologically-aware RL fine-tuning on SELFIES molecular representations.

experimental cheminformatics transformer molecular-generation molecular-generative-models custom-gpt qwen2 multi-token-prediction selfies-strings

Updated Oct 1, 2025
Python

ChemMiniQ3-SAbRLo is a lightweight experimental generative model for chemistry, built on mini Qwen2-like arch, designed for rapid prototyping of HuggingFace AutoModel and AutoTokenizer compatibility, and fast iteration of Multi-Token Prediction (MTP) and RL fine-tuning algorithms/rewards.

experimental cheminformatics transformer molecular-generation molecular-generative-models custom-gpt qwen2 multi-token-prediction selfies-strings

Updated Oct 1, 2025
Python

iprajax / gemma4-mtp

Star

Multi-Token Prediction benchmarks for Gemma 4 on Apple Silicon — LiteRT-LM, transformers, and llama.cpp at batch=1 on a MacBook M4 Pro. ~2× speedup reproducible in one specific runtime.

macos benchmark metal transformers gemma mlx edge-ai on-device-ai apple-silicon llama-cpp speculative-decoding multi-token-prediction litert-lm gemma-4 on-edge-llm

Updated May 7, 2026
HTML

aliuyar1234 / proberoute

Star

Research code for ProbeRoute, a probe-initialized sparse routing method for frozen-backbone multi-token prediction

machine-learning transformers pytorch language-models efficient-llm multi-token-prediction sparse-routing

Updated Apr 18, 2026
Python

chandan11248 / deepseek-innovations-from-scratch

Star

Reverse-engineering how DeepSeek achieved frontier LLM performance at a fraction of the cost — through hands-on PyTorch implementations of MLA, MoE, MTP, RoPE, and quantization.

deep-learning from-scratch mla mixture-of-experts quantization-aware-training large-language-models llm deepseek rotary-embeddings multi-token-prediction

Updated Feb 26, 2026
Jupyter Notebook

Improve this page

Add a description, image, and links to the multi-token-prediction topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multi-token-prediction topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi-token-prediction

Here are 9 public repositories matching this topic...

czg1225 / DMax

AtomicBot-ai / atomic-llama-cpp-turboquant

JaydenTeoh / beyond-next-token-prediction

Indras-Mirror / llama.cpp-mtp

gbyuvd / ChemMiniQ3-HoriFIE

gbyuvd / ChemMiniQ3-SAbRLo

iprajax / gemma4-mtp

aliuyar1234 / proberoute

chandan11248 / deepseek-innovations-from-scratch

Improve this page

Add this topic to your repo