Support for these drafters landed in llama.cpp on 2026-06-07
(ggml-org/llama.cpp#23398, plus E2B/E4B variants in ggml-org/llama.cpp#24282
a day later), but the llama.cpp submodule in v2026.6.9538 and current main
is pinned at 5343f4502a (2026-06-06) — just before that merge.
Could you bump the llama.cpp submodule and cut a new release? Happy to test
a pre-release wheel (CUDA 12.8, RTX 5090, Windows).
Support for these drafters landed in llama.cpp on 2026-06-07
(ggml-org/llama.cpp#23398, plus E2B/E4B variants in ggml-org/llama.cpp#24282
a day later), but the llama.cpp submodule in v2026.6.9538 and current main
is pinned at 5343f4502a (2026-06-06) — just before that merge.
Could you bump the llama.cpp submodule and cut a new release? Happy to test
a pre-release wheel (CUDA 12.8, RTX 5090, Windows).