Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
385 commits
Select commit Hold shift + click to select a range
ae2d348
metal: Implement ROLL op (#21946)
kushagharahi Apr 16, 2026
3f7c29d
ggml: add graph_reused (#21764)
am17an Apr 16, 2026
03b3d07
Convert: Fix NemotronH Config Parsing (#21664)
anavp-nvidia Apr 16, 2026
b572d1e
codeowners: add team member comments (#21714)
0cc4m Apr 16, 2026
f772f6e
model : support NVFP4 tensors for Gemma4 (#21971)
CISC Apr 16, 2026
9db77a0
model : refactor QKV into common build_qkv and create_tensor_qkv help…
JoursBleu Apr 16, 2026
4adac43
server: tests: fetch random media marker via /apply-template (#21962)…
ServeurpersoCom Apr 16, 2026
e45dbde
opencl: add q5_K gemm and gemv kernels for Adreno (#21595)
shaofeiqi Apr 16, 2026
4fbdabd
model: using single llm_build per arch (#21970)
ngxson Apr 16, 2026
85dde8d
hexagon: optimize HMX matmul operations (#21071)
chraac Apr 16, 2026
089dd41
cmake: use glob to collect src/models sources (#22005)
ngxson Apr 16, 2026
30dce2c
cli : use get_media_marker (#22017)
CISC Apr 16, 2026
5e6c0e1
opencl: refactor q8_0 set_tensor and mul_mat host side dispatch for A…
lhez Apr 17, 2026
fcc7508
model : Gemma4 model type detection (#22027)
EZForever Apr 17, 2026
6990e2f
libs : rename libcommon -> libllama-common (#21936)
ggerganov Apr 17, 2026
268d61e
mtmd: add missing struct tag (#22023)
65a Apr 17, 2026
a279d0f
ci : add android arm64 build and release (#21647)
ykhrustalev Apr 17, 2026
b94050e
CUDA: use LRU based eviction for cuda graphs (#21611)
am17an Apr 17, 2026
45cac7c
ggml-webgpu: fix compiler warnings and refactor FlashAttention encodi…
reeselevine Apr 17, 2026
fd1c0ec
llama: fit ctx size for CPU only (#21568)
JohannesGaessler Apr 18, 2026
89a5474
convert : fix (ignore for now) typings errors (#22002)
CISC Apr 18, 2026
83d58e0
ci : free disk space for rocm release (#22012)
CISC Apr 18, 2026
59accc8
ggml-backend-meta: add multi-segment read support in get_tensor (#22063)
ssam18 Apr 18, 2026
23b8cc4
android : libcommon -> libllama-common (#22076)
CISC Apr 18, 2026
4f02d47
model : refactor bias tensor variable names (#22079)
CISC Apr 18, 2026
9e5647a
server: Expose `media_tag` on /props endpoint. (#22028)
cetarthoriphros Apr 18, 2026
91fef95
rpc : refactor the RPC transport (#21998)
rgerganov Apr 19, 2026
455d8e4
server : speculative checkpointing (#19493)
srogmann Apr 19, 2026
09b4efa
cmake: remove CMP0194 policy to restore MSVC builds (#21934)
texasich Apr 19, 2026
8685e7b
convert : support sentence-transformer 5.4 config files (#22087)
Bing-su Apr 19, 2026
037bfe3
ci : install spirv-headers for vulkan-cross (#22109)
CISC Apr 19, 2026
bcdcc10
ggml : reduce CPU overhead in meta backend (#22041)
gaugarg-nv Apr 19, 2026
1912407
mtmd: add pos_0 to mtmd_image_tokens_get_decoder_pos (breaking change…
ngxson Apr 19, 2026
471540a
HIP: Remove unesscary NCCL_CHECK (#21914)
IMbackK Apr 19, 2026
d5b780a
common/autoparser : allow space after tool call (#22073)
aldehir Apr 19, 2026
4eac5b4
CUDA: refactor mma data loading for AMD (#22051)
JohannesGaessler Apr 19, 2026
e365e65
vendor : update cpp-httplib to 0.42.0 (#21781)
cabelo Apr 19, 2026
9d49acb
server: rename --clear-idle to --cache-idle-slots (#21741)
yychyo Apr 20, 2026
788fcbc
[SYCL] Fix reorder MMVQ assert on unaligned vocab sizes (#22035)
PMZFX Apr 20, 2026
de71b5f
server : refactor "use checkpoint" logic (#22114)
ggerganov Apr 20, 2026
81df3f7
fix: GLM-DSA crash in llama-tokenize when using vocab_only (#22102)
ssam18 Apr 20, 2026
a678916
mtmd: refactor mtmd_decode_use_mrope (#22161)
ngxson Apr 20, 2026
a6cc43c
ggml-webgpu: updated matrix-vector multiplication (#21738)
neha-ha Apr 20, 2026
7f251fd
ggml-cpu: Optimized x86 and generic cpu q1_0 dot (follow up) (#21636)
pl752 Apr 20, 2026
fb19f94
TP: fix 0-sized tensor slices, AllReduce fallback (#21808)
JohannesGaessler Apr 20, 2026
fd6ae4c
Tensor-parallel: Fix delayed AllReduce on Gemma-4 MoE (#22129)
gaugarg-nv Apr 20, 2026
cf8b0db
server : remove /api endpoints (#22165)
ggerganov Apr 20, 2026
86f8daa
mtmd: correct get_n_pos / get_decoder_pos (#22175)
ngxson Apr 20, 2026
9789512
ggml-cuda: flush legacy pool on OOM and retry (#22155)
leonardHONG Apr 20, 2026
ff6b106
server : fix hardcoded proxy connection timeout in router mode (#1876…
xris99 Apr 21, 2026
cfe9838
fit-params : refactor + add option to output estimated memory per dev…
ggerganov Apr 21, 2026
041fe83
ggml : bump version to 0.10.0 (ggml/1463)
ggerganov Apr 21, 2026
4889afb
sync : ggml
ggerganov Apr 21, 2026
cd03ec7
llama-ext : fix exports (#22202)
ggerganov Apr 21, 2026
9998d88
mtmd: correct mtmd_decode_use_mrope() (#22188)
ngxson Apr 21, 2026
82209ef
vulkan: Support F16 OP_FILL (#22177)
jeffbolznv Apr 21, 2026
7fc1c4e
metal : workaround macOS GPU interactivity watchdog (#22216)
ggerganov Apr 21, 2026
606fa42
vendor : update cpp-httplib to 0.43.1 (#22143)
cabelo Apr 21, 2026
52f1096
openvino: driver setup, CI split, thread safety, and NPU optimization…
wine99 Apr 21, 2026
84652b8
arg : add --spec-default (#22223)
ggerganov Apr 21, 2026
98d2d28
mtmd: Add support for Reka Edge 2603 (#21616)
kwajiehao Apr 21, 2026
72d693e
spec : reset i_last when low acceptance streak occurs (#22168)
treo Apr 21, 2026
2248799
hexagon: fix missing v79 entry in libggml-htp.inf (#22194)
mengshengwu Apr 21, 2026
5a4cd67
Hexagon: DAIG op (#22195)
shreyajn Apr 21, 2026
04fe84b
server: allow cancel loading model (#21814)
ngxson Apr 21, 2026
2799d93
ggml-webgpu: reset CPU/GPU profiling time when freeing context (#22050)
yomaytk Apr 21, 2026
0dedb9e
hexagon: add support for FILL op (#22198)
aparmp-quic Apr 21, 2026
ca7f7b7
ggml-webgpu(shader): support conv2d kernels. (#21964)
Constannnnnt Apr 22, 2026
134d6e5
common/chat, server: refactor, move all conversion functions to commo…
pwilkin Apr 22, 2026
750579f
common: Refactoring sampler parameters (#20429) (#22233)
ezturner Apr 22, 2026
7bfe60f
mtmd, llama : Update HunyuanVL vision-language model support (#22037)
ManaEstras Apr 22, 2026
17f6245
server: ignore reasoning content from transcription api (#21905)
ngxson Apr 22, 2026
82d3f4d
mtmd: also support LLAMA_ROPE_TYPE_NONE (#22242)
ngxson Apr 22, 2026
225088e
sycl: Improve mul_mat_id memory efficiency and add BF16 fast path (#2…
qnixsynapse Apr 22, 2026
bcb5eeb
speculative-simple : add checkpoint support (#22227)
ggerganov Apr 22, 2026
8bccdbb
chat: fix parallel_tool_calls default setting based on model capabili…
pwilkin Apr 22, 2026
6da7168
ggml-webgpu: Add fused RMS_NORM + MUL (#21983)
yomaytk Apr 22, 2026
0d0764d
[WebGPU] Implement async tensor api and event api (#22099)
nikhilJain17 Apr 22, 2026
6217b49
HIP: flip GGML_HIP_GRAPHS to default on (#22254)
IMbackK Apr 23, 2026
86db42e
CUDA: fuse relu + sqr (#22249)
anavp-nvidia Apr 23, 2026
b76429a
ggml-webgpu: add support for im2col (#22259)
Constannnnnt Apr 23, 2026
60b68a6
sycl : fused MoE mul_mat_vec_q for TG (#21920)
abotsis Apr 23, 2026
5eaee65
convert : Handle ModelOpt produced mixed precision model during conve…
ynankani Apr 23, 2026
4ead6fd
[SYCL] Update oneapi 2025.3.3, Seperate SYCL build, release Ubuntu 24…
NeoZhangJianyu Apr 23, 2026
96c1db2
ggml-base: use MATH_LIBRARY variable instead of hardcoded 'm' (#22239)
ggerganov Apr 23, 2026
930e021
gitignore: add AGENTS.local.md (#22246)
ggerganov Apr 23, 2026
8635e22
metal : fix event synchronization (#22260)
ggerganov Apr 23, 2026
550d684
server: Enable transcriptions API for LFM2-Audio (#22000)
tdakhran Apr 23, 2026
0dd7f91
cli : cleanup auto-completion code (#21745)
matthiasstraka Apr 23, 2026
9012c50
model-conversion : fix mmproj output file name [no ci] (#22274)
danbev Apr 23, 2026
0949beb
fix build number for sycl release (#22283)
CISC Apr 23, 2026
c807c6e
server: (anthropic API) fix prefix caching (#21793)
kvc0 Apr 23, 2026
12568ca
vendor : update LibreSSL to 4.3.1 (#22285)
angt Apr 23, 2026
c78fb90
server: fix heap-buffer-overflow from negative n_discard (CVE-2026-21…
SongTonyLi Apr 23, 2026
185cbff
server : convert_anthropic_to_oai: also copy chat_template_kwargs (#2…
Soreepeong Apr 23, 2026
187a456
Enable testing on Snapdragon devices (#21051)
shreyajn Apr 23, 2026
5d2b52d
hexagon: add support for basic and extended Op profiling (#22269)
max-krasnyansky Apr 23, 2026
fa0b8a7
cli: Remove redundant local sampling variables (#20429) (#22264)
ezturner Apr 23, 2026
e5f070a
fix(shader): handle the buffer aliasing for rms fuse (#22266)
Constannnnnt Apr 23, 2026
8bc492e
hexagon: add SOLVE_TRI op (#21974)
mengshengwu Apr 24, 2026
793d0a7
server: rename debug tags to match --cache-idle-slots naming (#22292)
yychyo Apr 24, 2026
ffdd983
server : fix swa-full logic (#22288)
ggerganov Apr 24, 2026
017f090
jinja : remove unused header (#22310)
ggerganov Apr 24, 2026
e583f3b
ggml : minor coding style (#22308)
ggerganov Apr 24, 2026
dc80c52
common : fix jinja warnings with clang 21 (#22313)
angt Apr 24, 2026
15fa3c4
metal : print GPU description (#22318)
ggerganov Apr 24, 2026
f65bc34
hexagon: use DIRID 13 in libggml-htp.inf for modern InfVerif (#22306)
mengshengwu Apr 24, 2026
13d36cf
ggml-webgpu: enable FLASH_ATTN_EXT on browser without subgroup matrix…
ArberSephirotheca Apr 24, 2026
a702f39
CI Snapdragon: Switch ubuntu-latest to ubuntu-slim runner (#22303)
shreyajn Apr 24, 2026
361fe72
Hexagon: Bump HMX Frequency to Max Corner (#22334)
trivikram-reddy1 Apr 24, 2026
0adede8
parser: fix structured output bug (#22302)
pwilkin Apr 24, 2026
dd2914d
ggml-webgpu: support for SSM_SCAN and disable set_rows error checking…
reeselevine Apr 25, 2026
eddd7a1
[SYCL] Optimize Q4_0 mul_mat for Arc770, add scripts (#22291)
arthw Apr 25, 2026
8ea8fee
gitignore : add .pi + personal SYSTEM.md (#22316)
ggerganov Apr 25, 2026
9d34231
llama-quant : default ftype param `Q5_1` --> `Q8_0` (#20828)
ddh0 Apr 25, 2026
d164904
metal : optimize Metal Tensor API usage for GGML_OP_MUL_MAT (#20962)
Developer-Ecosystem-Engineering Apr 25, 2026
9725a31
CUDA: reduce MMQ stream-k overhead (#22298)
JohannesGaessler Apr 25, 2026
98dc141
spec : fix vocab compat checks (#22358)
ggerganov Apr 25, 2026
dcad77c
chat: fix handling of space in reasoning markers (#22353)
pwilkin Apr 25, 2026
b760272
hexagon: guard HMX clock request for v75+ platforms (#22377)
trivikram-reddy1 Apr 26, 2026
f454bd7
opencl: add iq4_nl support (#22272)
lhez Apr 26, 2026
2dd8416
ggml-cpu: optimize avx2 q6_k (#22345)
netrunnereve Apr 26, 2026
0c6ee1c
ggml-cpu : re-enable fast gelu_quick_f16 (#22339)
CISC Apr 26, 2026
b1a5bd4
CUDA: better coalesce data-access for contiguous concat (#22330)
ORippler Apr 26, 2026
7ec36aa
Github: set meta backend code owner (#22388)
JohannesGaessler Apr 26, 2026
78433f6
Fix recurrent state serialization for partial reads and writes (#22362)
gaugarg-nv Apr 26, 2026
06a811d
add performance-portable tuning for register-tile and subgroup matmul…
SharmaRithik Apr 26, 2026
f535774
pr2wt : symlink .pi (#22386)
ggerganov Apr 26, 2026
5594d13
common: fix missing exports in llama-common (#22340)
max-krasnyansky Apr 27, 2026
f84270e
ggml : use 64 bytes aligned tile buffers (#21058)
angt Apr 27, 2026
d13540b
convert : remove input_scale for dequantized fp8 modelopt (#22356)
CISC Apr 27, 2026
0f1bb60
model : remove duplicate wo_s scale after build_attn (Qwen3, LLaMA) (…
ynankani Apr 27, 2026
e940b3d
download : prefer q8_0 when q4_k not available (#22428)
ggerganov Apr 27, 2026
42401c7
Fix type casting for unaccounted memory calculation (#22424)
rankaiyx Apr 27, 2026
ceaf47c
fix: rpc-server cache may not work in Windows environments (#22394)
unraido Apr 27, 2026
4414c04
Additional test for common/gemma4 : handle parsing edge cases (#22420)
hextriclosan Apr 27, 2026
665abc6
add fast mat-vec kernels for i-quants (#22344)
SharmaRithik Apr 27, 2026
983ca89
server: (router) Forward form-data to model server (Fixes #22044) (#2…
tha80 Apr 27, 2026
434b2a1
ggml-webgpu: add Q1_0 support (#22374)
SharmaRithik Apr 27, 2026
516e8d7
server: use pos_next instead of n_tokens for m-rope (#22439)
am17an Apr 28, 2026
14e733e
spec : refactor params (#22397)
ggerganov Apr 28, 2026
c3e08f4
CANN: add new ops, optimize existing ops (#21204)
hipudding Apr 28, 2026
d530d6e
ggml : revert to -lm linking instead of find_library (#22355)
angt Apr 28, 2026
50494a2
ggml : skip already registered backends and devices (#22296)
angt Apr 28, 2026
698d19b
ggml: improve SPIR-V headers detection with __has_include (#21918)
EmilAskerov Apr 28, 2026
1982117
vulkan: add barrier after writetimestamp (#21865)
jeffbolznv Apr 28, 2026
f42e29f
webui: Server tools (#21237)
allozaur Apr 28, 2026
98bb579
ggml-webgpu: fix buffer aliasing for ssm_scan and refactor aliasing l…
reeselevine Apr 28, 2026
f9f3365
vulkan: Coalesce Q4_K/Q5_K scale loads (#21751)
TheBlueMatt Apr 28, 2026
52e5f0a
common : re-arm reasoning budget after DONE on new <think> (#22323)
BruceJillis Apr 28, 2026
5d56eff
convert : add support for Nemotron Nano 3 Omni (#22481)
danbev Apr 28, 2026
7b8443a
ggml-cuda: add flash-attn support for DKQ=320/DV=256 with ncols2=32 (…
lnigam Apr 28, 2026
fc2b005
ggml-cuda: Repost of 21896: Blackwell native NVFP4 support (#22196)
michaelw9999 Apr 28, 2026
739393b
TP: fix delayed AllReduce + zero-sized slices (#22489)
JohannesGaessler Apr 29, 2026
bdc9c74
ggml : add sve tuned code for gemm_q8_0_4x8_q8_0() kernel (#21916)
hrushitfujitsu Apr 29, 2026
7b95ea5
common: Intentionally leak logger instance to fix hanging on Windows …
rillomas Apr 29, 2026
d6a5094
ggml-webgpu: Fix bug in FlashAttention support check (#22492)
reeselevine Apr 29, 2026
b5c4227
ggml-cpu: cmake: append xsmtvdotii march for SpacemiT IME (#22317)
qiurui144 Apr 29, 2026
3142f1d
ggml-cuda: refactor fusion code (#22468)
am17an Apr 29, 2026
1cbc846
ggml-cpu : disable tiled matmul on AIX to fix page boundary segfault …
shalinib-ibm Apr 29, 2026
59237bf
webui: fix slow mic stop and WAV encode (#22480)
ServeurpersoCom Apr 29, 2026
4b221b7
ggml : bump version to 0.10.1 (ggml/1469)
ggerganov Apr 29, 2026
b1d5f5b
sync : ggml
ggerganov Apr 29, 2026
683c5ac
spec : disacard last drafted token with low prob (#22506)
ggerganov Apr 29, 2026
098705a
CUDA: fuse SSM_CONV + ADD(bias) + SILU (#22478)
anavp-nvidia Apr 29, 2026
41a63be
hexagon: make vmem and buffer-size configurable (#22487)
max-krasnyansky Apr 29, 2026
d775992
common : do not pass prompt tokens to reasoning budget sampler (#22488)
aldehir Apr 29, 2026
b42c7fa
spec : fix vocab compat checks in spec example (#22426)
petersid2022 Apr 30, 2026
80afa33
spec : fix draft model checkpoints (#22521)
ggerganov Apr 30, 2026
4515559
add fast matmul iquants (#22504)
SharmaRithik Apr 30, 2026
27aef3d
scripts : add wc2wt.sh - create worktree from current HEAD (#22513)
ggerganov Apr 30, 2026
e82aaf2
CUDA: fix tile FA kernel on Pascal (#22541)
JohannesGaessler Apr 30, 2026
5f0ab72
vendor : update cpp-httplib to 0.43.2 (#22548)
angt Apr 30, 2026
6118c04
ci : bump ty to 0.0.33 (#22535)
CISC Apr 30, 2026
c20c445
spec: fix argument typo (#22552)
barnjamin Apr 30, 2026
660b1b4
vulkan: add get/set tensor 2d functions (#22514)
0cc4m Apr 30, 2026
beb42ff
common : check for null getpwuid in hf-cache (#22550)
angt Apr 30, 2026
5cbfb18
Update llama-mmap to use ftello/fseeko (#22497)
reeselevine Apr 30, 2026
a95a11e
ggml-webgpu: Improve performance of mat-vec and mat-mat for MUL_MAT_I…
yomaytk Apr 30, 2026
aab6821
ggml-webgpu: add the upscale shader (#22419)
Constannnnnt May 1, 2026
05e141a
vulkan: Support asymmetric FA in coopmat2 path (#21753)
jeffbolznv May 1, 2026
c3c1505
ggml-webgpu: Fix vectorized handling in mul-mat and mul-mat-id (#22578)
yomaytk May 1, 2026
ab6120c
webui: Spring Cleaning Refactor v1 (#22505)
allozaur May 1, 2026
2098fd6
hexagon: enable non-contiguous row tensor support for unary ops (#22574)
aparmp-quic May 1, 2026
b97ebdc
llama-quant : fix `--tensor-type` when default `qtype` is overriden (…
ddh0 May 1, 2026
1a03cf4
hexagon: hmx flash attention (#22347)
njsyw1997 May 2, 2026
e8ec7ab
ggml : try fix win32 build (whisper/0)
ggerganov May 1, 2026
457e228
sync : ggml
ggerganov May 1, 2026
ed23489
ggml : bump version to 0.10.2 (ggml/1474)
ggerganov May 2, 2026
228e836
sync : ggml
ggerganov May 2, 2026
9dbb372
Github: update issue templates (#22594)
JohannesGaessler May 2, 2026
c5a3bc3
opencl: Adreno optimization for MoE - MxFP4 (#22301)
shawngu-quic May 2, 2026
63d93d1
convert : disable uint types (#18908)
csabakecskemeti May 2, 2026
0929436
ggml-virtgpu: fix circular dependency in headers (#22557)
Juste-Leo2 May 2, 2026
0754b7b
server : avoid checkpoint data host copies (#22558)
ggerganov May 2, 2026
d05fe1d
fix: CUDA device PCI bus ID de-dupe OOMing (ignoring other 3 gpus ent…
lucyknada May 2, 2026
db44417
convert : apply Q/K RoPE permutation in NVFP4 repack path (#22611)
jmrobles May 3, 2026
048a490
convert : Mistral format yarn apply_scale support (#22612)
juliendenize May 3, 2026
e48034d
common : determine generation prompt using longest common prefix (#22…
aldehir May 3, 2026
d4b0c22
ggml-webgpu: add layer norm ops (#22406)
Constannnnnt May 4, 2026
6dcd824
vulkan: delete dead GGML_VK_MAX_NODES def (#22621)
Atomic-Germ May 4, 2026
846262d
docs : update speculative decoding parameters after refactor (#22397)…
ggerganov May 4, 2026
fa8feae
webui: restore missing settings (#22666)
ntowle May 4, 2026
c84e6d6
server: Add a simple get_datetime server tool (#22649)
eapache May 4, 2026
994118a
model: move `load_hparams` and `load_tensors` to per-model definition…
ngxson May 4, 2026
a4701c9
common/autoparser: fixes for newline handling / forced tool calls (#2…
pwilkin May 4, 2026
36a694c
webui : fix circular dependency between chat.service.ts and models.sv…
Juste-Leo2 May 4, 2026
d8794ee
examples: refactor diffusion generation (#22590)
Sailaukan May 4, 2026
935a340
server: implement /models?reload=1 (#21848)
ngxson May 4, 2026
e77056f
CUDA: use fastdiv for batch index split in get_rows (#22650)
leonardHONG May 4, 2026
eff0670
kleidiai : update to v1.24.0 and use release archive (#22549)
chaxu01 May 4, 2026
a817a22
ggml : implement fast walsh-hadamard transform for kv rotation (#2135…
AlrIsmail May 5, 2026
fa59546
graph : handle non-contiguous Q/K/V in mul_mat_aux (#22630)
CISC May 5, 2026
d6e7b03
llama : add option to save memory in device buffers (#22679)
ggerganov May 5, 2026
2bacb1e
server : validate --tools CLI argument against known tool names (#22538)
ggerganov May 5, 2026
a09a00e
vendor : update cpp-httplib to 0.43.3 (#22686)
cabelo May 5, 2026
bf76ac7
common : only load backends when required (#22290)
angt May 5, 2026
c91faf9
ggml : bump version to 0.11.0 (ggml/1478)
ggerganov May 5, 2026
70a8309
sync : ggml
ggerganov May 5, 2026
2635ac7
common : fix missing-noreturn warnings when compiling with clang 21 (…
angt May 5, 2026
d5003b6
rpc : use graph uid instead of graph cache (#22701)
rgerganov May 5, 2026
ff806a1
opencl: refactor Adreno q4_0 (#22335)
lhez May 5, 2026
bbeb89d
Hexagon: Process M-tail rows on HMX instead of HVX (#22724)
trivikram-reddy1 May 5, 2026
2ca1161
ggml : use `CL_DEVICE_GLOBAL_MEM_SIZE` as memory estimate for OpenCL …
fl0rianr May 6, 2026
74d6248
convert : add filter_tensors method to pre-filter tensors (#22597)
CISC May 6, 2026
07eaf91
add tabindex and aria-hidden (#22699)
vignesh191 May 6, 2026
f08f20a
ggml-cpu: fuse RMS_NORM + MUL on CPU backend (#22423)
zzzzwc May 6, 2026
e3e3f8e
webui: Remove Google Favicons & Improve MCP Information logic & UI (#…
allozaur May 6, 2026
a736e6c
convert : ignore non-language tensors for Gemma4Model (#22753)
danbev May 6, 2026
7501419
feat: migrate to PEP 621 and add uv support (#21907)
dhdaines May 6, 2026
a00e47e
mtmd: add granite-speech support (ibm-granite/granite-4.0-1b-speech) …
ReinforcedKnowledge May 6, 2026
a290ce6
gguf-py : bump version to 0.19.0 (#22664)
ggerganov May 6, 2026
a010122
common: do not fit to unknown device memory (#22614)
fl0rianr May 6, 2026
5207d12
model : don't crash on unsupported architecture (#22742)
giladgd May 6, 2026
2496f9c
mtmd : support MiniCPM-V 4.6 (#22529)
tc-mb May 6, 2026
3980e04
llama : add missing call to ggml_backend_load_all() (#22752)
angt May 7, 2026
cfff1fc
sycl : fix test script (#22737)
dogunbound May 7, 2026
e358d75
webui: fix flicker issue on dismiss animation on overlay primitives (…
vignesh191 May 7, 2026
97f06e9
codeowners : add ZenDNN backend codeowner (#22772)
z-vishal May 7, 2026
f4b5a2e
webui: fix ?model= URL param race in router mode (#22771)
ServeurpersoCom May 7, 2026
8e52631
model: Add Mimo v2.5 model support (#22493)
AesSedai May 7, 2026
cc97e45
mtmd: fix whisper audio tail truncation by exposing padded buffer to …
ServeurpersoCom May 7, 2026
4815e0f
feat: TurboQuant KV cache compression for HIP/ROCm (gfx1100)
Apr 6, 2026
8707ccf
prep: add --hugepages flag for anonymous HugeTLB-backed weight loadin…
doctorjei Apr 12, 2026
38df43e
implementation: add --hugepages flag for anonymous HugeTLB-backed wei…
Apr 12, 2026
7962fbe
cuda : enable buffer_from_host_ptr for integrated GPUs (HIP only)
doctorjei Apr 17, 2026
cfaeae1
Merge branch 're-main'
May 7, 2026
0423b03
Updated GGML_API signature (no extern)
May 7, 2026
51246dc
Adjust test array size to work with FP32
May 9, 2026
fa3e592
Added error ranges for TQ variants
May 9, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion .devops/intel.Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
ARG ONEAPI_VERSION=2025.3.2-0-devel-ubuntu24.04
ARG ONEAPI_VERSION=2025.3.3-0-devel-ubuntu24.04

## Build Image

Expand Down
2 changes: 2 additions & 0 deletions .devops/nix/package.nix
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
vulkan-loader,
openssl,
shaderc,
spirv-headers,
useBlas ?
builtins.all (x: !x) [
useCuda
Expand Down Expand Up @@ -145,6 +146,7 @@ effectiveStdenv.mkDerivation (finalAttrs: {
ninja
pkg-config
git
spirv-headers
]
++ optionals useCuda [
cudaPackages.cuda_nvcc
Expand Down
50 changes: 48 additions & 2 deletions .devops/openvino.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,19 @@ ARG OPENVINO_VERSION_MAJOR=2026.0
ARG OPENVINO_VERSION_FULL=2026.0.0.20965.c6d6a13a886
ARG UBUNTU_VERSION=24.04

# Optional proxy build arguments - empty by default
# Intel GPU driver versions. https://github.com/intel/compute-runtime/releases
ARG IGC_VERSION=v2.30.1
ARG IGC_VERSION_FULL=2_2.30.1+20950
ARG COMPUTE_RUNTIME_VERSION=26.09.37435.1
ARG COMPUTE_RUNTIME_VERSION_FULL=26.09.37435.1-0
ARG IGDGMM_VERSION=22.9.0

# Intel NPU driver versions. https://github.com/intel/linux-npu-driver/releases
ARG NPU_DRIVER_VERSION=v1.32.0
ARG NPU_DRIVER_FULL=v1.32.0.20260402-23905121947
ARG LIBZE1_VERSION=1.27.0-1~24.04~ppa2

# Optional proxy build arguments
ARG http_proxy=
ARG https_proxy=

Expand Down Expand Up @@ -78,13 +90,47 @@ ARG http_proxy
ARG https_proxy

RUN apt-get update \
&& apt-get install -y libgomp1 libtbb12 curl \
&& apt-get install -y libgomp1 libtbb12 curl wget ocl-icd-libopencl1 \
&& apt autoremove -y \
&& apt clean -y \
&& rm -rf /tmp/* /var/tmp/* \
&& find /var/cache/apt/archives /var/lib/apt/lists -not -name lock -type f -delete \
&& find /var/cache -type f -delete

# Install GPU drivers
ARG IGC_VERSION
ARG IGC_VERSION_FULL
ARG COMPUTE_RUNTIME_VERSION
ARG COMPUTE_RUNTIME_VERSION_FULL
ARG IGDGMM_VERSION
RUN mkdir /tmp/neo/ && cd /tmp/neo/ \
&& wget https://github.com/intel/intel-graphics-compiler/releases/download/${IGC_VERSION}/intel-igc-core-${IGC_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/intel-graphics-compiler/releases/download/${IGC_VERSION}/intel-igc-opencl-${IGC_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/intel-ocloc-dbgsym_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.ddeb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/intel-ocloc_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/intel-opencl-icd-dbgsym_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.ddeb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/intel-opencl-icd_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.deb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/libigdgmm12_${IGDGMM_VERSION}_amd64.deb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/libze-intel-gpu1-dbgsym_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.ddeb \
&& wget https://github.com/intel/compute-runtime/releases/download/${COMPUTE_RUNTIME_VERSION}/libze-intel-gpu1_${COMPUTE_RUNTIME_VERSION_FULL}_amd64.deb \
&& dpkg --install *.deb \
&& rm -rf /tmp/neo/

# Install NPU drivers
ARG NPU_DRIVER_VERSION
ARG NPU_DRIVER_FULL
ARG LIBZE1_VERSION
RUN mkdir /tmp/npu/ && cd /tmp/npu/ \
&& wget https://github.com/intel/linux-npu-driver/releases/download/${NPU_DRIVER_VERSION}/linux-npu-driver-${NPU_DRIVER_FULL}-ubuntu2404.tar.gz \
&& tar -xf linux-npu-driver-${NPU_DRIVER_FULL}-ubuntu2404.tar.gz \
&& dpkg --install *.deb \
&& rm -rf /tmp/npu/

RUN cd /tmp \
&& wget https://snapshot.ppa.launchpadcontent.net/kobuk-team/intel-graphics/ubuntu/20260324T100000Z/pool/main/l/level-zero-loader/libze1_${LIBZE1_VERSION}_amd64.deb \
&& dpkg --install libze1_${LIBZE1_VERSION}_amd64.deb \
&& rm libze1_${LIBZE1_VERSION}_amd64.deb

COPY --from=build /app/lib/ /app/

### Full (all binaries)
Expand Down
2 changes: 1 addition & 1 deletion .devops/vulkan.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ RUN apt update && apt install -y git build-essential cmake wget xz-utils

# Install SSL and Vulkan SDK dependencies
RUN apt install -y libssl-dev curl \
libxcb-xinput0 libxcb-xinerama0 libxcb-cursor-dev libvulkan-dev glslc
libxcb-xinput0 libxcb-xinerama0 libxcb-cursor-dev libvulkan-dev glslc spirv-headers

# Build it
WORKDIR /app
Expand Down
2 changes: 2 additions & 0 deletions .github/ISSUE_TEMPLATE/010-bug-compilation.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ body:
after recreating the CMake build directory and with `-DGGML_CCACHE=OFF`.
If the compilation succeeds with ccache disabled you should be able to permanently fix the issue
by clearing `~/.cache/ccache` (on Linux).

Please fill out this template yourself, copypasting language model outputs is [strictly prohibited](https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md#ai-usage-policy).
- type: textarea
id: commit
attributes:
Expand Down
4 changes: 3 additions & 1 deletion .github/ISSUE_TEMPLATE/011-bug-results.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: Bug (model use)
description: Something goes wrong when using a model (in general, not specific to a single llama.cpp module).
description: Something goes wrong when running a model (crashes, garbled outputs, etc.).
title: "Eval bug: "
labels: ["bug-unconfirmed", "model evaluation"]
body:
Expand All @@ -12,6 +12,8 @@ body:
If you encountered the issue while using an external UI (e.g. ollama),
please reproduce your issue using one of the examples/binaries in this repository.
The `llama-completion` binary can be used for simple and reproducible model inference.

Please fill out this template yourself, copypasting language model outputs is [strictly prohibited](https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md#ai-usage-policy).
- type: textarea
id: version
attributes:
Expand Down
2 changes: 2 additions & 0 deletions .github/ISSUE_TEMPLATE/019-bug-misc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ body:
This issue template is intended for miscellaneous bugs that don't fit into any other category.
If you encountered the issue while using an external UI (e.g. ollama),
please reproduce your issue using one of the examples/binaries in this repository.
Please fill out this template yourself, copypasting language model outputs is [strictly prohibited](https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md#ai-usage-policy).
- type: textarea
id: version
attributes:
Expand Down
2 changes: 2 additions & 0 deletions .github/ISSUE_TEMPLATE/020-enhancement.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ body:
value: |
[Please post your idea first in Discussion if there is not yet a consensus for this enhancement request. This will help to keep this issue tracker focused on enhancements that the community has agreed needs to be implemented.](https://github.com/ggml-org/llama.cpp/discussions/categories/ideas)
Please fill out this template yourself, copypasting language model outputs is [strictly prohibited](https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md#ai-usage-policy).
- type: checkboxes
id: prerequisites
attributes:
Expand Down
2 changes: 2 additions & 0 deletions .github/ISSUE_TEMPLATE/030-research.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ body:
value: |
Don't forget to check for any [duplicate research issue tickets](https://github.com/ggml-org/llama.cpp/issues?q=is%3Aopen+is%3Aissue+label%3A%22research+%F0%9F%94%AC%22)
Please fill out this template yourself, copypasting language model outputs is [strictly prohibited](https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md#ai-usage-policy).
- type: checkboxes
id: research-stage
attributes:
Expand Down
2 changes: 2 additions & 0 deletions .github/ISSUE_TEMPLATE/040-refactor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ body:
Don't forget to [check for existing refactor issue tickets](https://github.com/ggml-org/llama.cpp/issues?q=is%3Aopen+is%3Aissue+label%3Arefactoring) in case it's already covered.
Also you may want to check [Pull request refactor label as well](https://github.com/ggml-org/llama.cpp/pulls?q=is%3Aopen+is%3Apr+label%3Arefactoring) for duplicates too.
Please fill out this template yourself, copypasting language model outputs is [strictly prohibited](https://github.com/ggml-org/llama.cpp/blob/master/CONTRIBUTING.md#ai-usage-policy).
- type: textarea
id: background-description
attributes:
Expand Down
8 changes: 8 additions & 0 deletions .github/labeler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,10 +73,18 @@ android:
- changed-files:
- any-glob-to-any-file:
- examples/llama.android/**
server/webui:
- changed-files:
- any-glob-to-any-file:
- tools/server/webui/**
- tools/server/public/**
server:
- changed-files:
- any-glob-to-any-file:
- tools/server/**



ggml:
- changed-files:
- any-glob-to-any-file:
Expand Down
2 changes: 1 addition & 1 deletion .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

<!-- You can provide more details and link related discussions here. Delete this section if not applicable -->

# Requirements
## Requirements

<!-- IMPORTANT: Please do NOT delete this section, otherwise your PR may be rejected -->

Expand Down
116 changes: 116 additions & 0 deletions .github/workflows/build-and-test-snapdragon.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
name: CI (snapdragon)

on:
workflow_dispatch:
push:
branches:
- master
paths:
- '.github/workflows/build-and-test-snapdragon.yml'
- 'ggml/include/ggml-hexagon.h'
- 'ggml/src/ggml-hexagon/**'
- 'docs/backend/snapdragon/**'
- 'scripts/snapdragon/**'
- 'CMakePresets.json'

pull_request:
types: [opened, synchronize, reopened]
paths:
- '.github/workflows/build-and-test-snapdragon.yml'
- 'ggml/include/ggml-hexagon.h'
- 'ggml/src/ggml-hexagon/**'
- 'docs/backend/snapdragon/**'
- 'scripts/snapdragon/**'
- 'CMakePresets.json'

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref && github.ref || github.run_id }}
cancel-in-progress: true

jobs:
android-ndk-snapdragon:
runs-on: ubuntu-latest
container:
image: 'ghcr.io/snapdragon-toolchain/arm64-android:v0.3'
defaults:
run:
shell: bash

steps:
- name: Clone
uses: actions/checkout@v6
with:
fetch-depth: 0
lfs: false

- name: Build Llama.CPP for Snapdragon Android
id: build_llama_cpp_snapdragon_android
run: |
cp docs/backend/snapdragon/CMakeUserPresets.json .
cmake --preset arm64-android-snapdragon-release -B build
cmake --build build
cmake --install build --prefix pkg-snapdragon/llama.cpp

- name: Upload Llama.CPP Snapdragon Android Build Artifact
if: ${{ always() && steps.build_llama_cpp_snapdragon_android.outcome == 'success' }}
uses: actions/upload-artifact@v6
with:
name: llama-cpp-android-arm64-snapdragon
path: pkg-snapdragon/llama.cpp

test-snapdragon-qdc:
name: Test on QDC Android Device (${{ matrix.device }})
needs: [android-ndk-snapdragon]
runs-on: ubuntu-slim
strategy:
fail-fast: false
matrix:
device: [SM8750, SM8650, SM8850]

steps:
- name: Checkout
uses: actions/checkout@v6

- name: Download build artifact
uses: actions/download-artifact@v7
with:
name: llama-cpp-android-arm64-snapdragon
path: pkg-snapdragon/llama.cpp

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.x'
cache: pip

- name: Install system dependencies
run: |
sudo apt-get update
sudo apt-get install -y curl unzip

- name: Install QDC SDK wheel
run: |
curl -fSL -o qdc_sdk.zip https://softwarecenter.qualcomm.com/api/download/software/tools/Qualcomm_Device_Cloud_SDK/All/0.2.3/qualcomm_device_cloud_sdk-0.2.3.zip
unzip qdc_sdk.zip -d qdc_sdk
pip install qdc_sdk/qualcomm_device_cloud_sdk-0.2.3-py3-none-any.whl

- name: Check QDC API key
id: check_secret
env:
QDC_API_KEY: ${{ secrets.QDC_API_KEY }}
run: echo "has-qdc-key=${{ env.QDC_API_KEY != '' }}" >> "$GITHUB_OUTPUT"

- name: Run QDC tests (${{ matrix.device }})
if: steps.check_secret.outputs.has-qdc-key == 'true'
run: |
python scripts/snapdragon/qdc/run_qdc_jobs.py \
--test all \
--pkg-dir pkg-snapdragon/llama.cpp \
--model-url "https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_0.gguf" \
--device ${{ matrix.device }}
env:
QDC_API_KEY: ${{ secrets.QDC_API_KEY }}

- name: Cleanup
if: always()
run: rm -rf pkg-snapdragon qdc_sdk qdc_sdk.zip
51 changes: 19 additions & 32 deletions .github/workflows/build-android.yml
Original file line number Diff line number Diff line change
@@ -1,26 +1,24 @@
name: CI (android)

on:
workflow_dispatch: # allows manual triggering
workflow_dispatch:
push:
branches:
- master
paths: [
'.github/workflows/build-android.yml',
'**/CMakeLists.txt',
'**/.cmake',
'**/*.h',
'**/*.hpp',
'**/*.c',
'**/*.cpp'
]
paths:
- '.github/workflows/build-android.yml'
- '**/CMakeLists.txt'
- '**/.cmake'
- '**/*.h'
- '**/*.hpp'
- '**/*.c'
- '**/*.cpp'

pull_request:
types: [opened, synchronize, reopened]
paths: [
'.github/workflows/build-android.yml',
'examples/llama.android/**'
]
paths:
- '.github/workflows/build-android.yml'
- 'examples/llama.android/**'

concurrency:
group: ${{ github.workflow }}-${{ github.head_ref && github.ref || github.run_id }}
Expand Down Expand Up @@ -51,7 +49,7 @@ jobs:
distribution: zulu

- name: Setup Android SDK
uses: android-actions/setup-android@9fc6c4e9069bf8d3d10b2204b1fb8f6ef7065407 # v3
uses: android-actions/setup-android@40fd30fb8d7440372e1316f5d1809ec01dcd3699 # v4.0.1
with:
log-accepted-android-sdk-licenses: false

Expand All @@ -67,35 +65,24 @@ jobs:
defaults:
run:
shell: bash
strategy:
matrix:
include:
- build: 'arm64-cpu'
defines: '-D ANDROID_ABI=arm64-v8a -D ANDROID_PLATFORM=android-31 -D CMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_ROOT}/build/cmake/android.toolchain.cmake -D GGML_NATIVE=OFF -DGGML_CPU_ARM_ARCH=armv8.5-a+fp16+i8mm -G Ninja -D LLAMA_OPENSSL=OFF -D GGML_OPENMP=OFF'
- build: 'arm64-snapdragon'
defines: '--preset arm64-android-snapdragon-release'

steps:
- name: Clone
id: checkout
uses: actions/checkout@v6
with:
fetch-depth: 0
lfs: false

- name: Build Llama.CPP for Hexagon Android
id: build_llama_cpp_hexagon_android
- name: Build
id: ndk_build
run: |
if [[ "${{ matrix.build }}" == "arm64-snapdragon" ]]; then
cp docs/backend/snapdragon/CMakeUserPresets.json .
fi
cmake ${{ matrix.defines }} -B build
cmake -D ANDROID_ABI=arm64-v8a -D ANDROID_PLATFORM=android-31 -D CMAKE_TOOLCHAIN_FILE=${ANDROID_NDK_ROOT}/build/cmake/android.toolchain.cmake -D GGML_NATIVE=OFF -DGGML_CPU_ARM_ARCH=armv8.5-a+fp16+i8mm -G Ninja -D LLAMA_OPENSSL=OFF -D GGML_OPENMP=OFF -B build
cmake --build build
cmake --install build --prefix pkg-adb/llama.cpp

- name: Upload Llama.CPP Hexagon Android Build Artifact
if: ${{ always() && steps.build_llama_cpp_hexagon_android.outcome == 'success' }}
- name: Upload Android Build Artifact
if: ${{ always() && steps.ndk_build.outcome == 'success' }}
uses: actions/upload-artifact@v6
with:
name: llama-cpp-android-${{ matrix.build }}
name: llama-cpp-android-arm64-cpu
path: pkg-adb/llama.cpp
Loading
Loading