#

tensor-sharing

Here is 1 public repository matching this topic...

Indras-Mirror / llama.cpp-mtp

Fused TBQ4 Flash Attention + MTP + Shared Tensors for llama.cpp — 82+ tok/s with lossless 4.25 bpv KV cache at 200K context on RTX 4090

cuda quantization mtp kv-cache fwht llama-cpp flash-attention qwen speculative-decoding rtx-4090 multi-token-prediction turboquant tbq4 tensor-sharing

Updated May 9, 2026
C++

Improve this page

Add a description, image, and links to the tensor-sharing topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the tensor-sharing topic, visit your repo's landing page and select "manage topics."