Glm45 by Nexesenex · Pull Request #336 · Nexesenex/croco.cpp

Nexesenex · 2025-08-02T17:03:24Z

No description provided.

* vulkan: optimizations for direct convolution - Empirically choose a better tile size. Reducing BS_K/BS_NPQ helps fill the GPU. The new size should be amenable to using coopmat, too. - Fix shmem bank conflicts. 16B padding should work with coopmat. - Some explicit loop unrolling. - Skip math/stores work for parts of the tile that are OOB. - Apply fastdiv opt. - Disable shuffles for NV. * Three tiles sizes for CONV_2D, and a heuristic to choose * reallow collectives for pre-Turing * make SHMEM_PAD a spec constant * fixes for intel perf - no shmem padding, placeholder shader core count * shader variants with/without unrolling * 0cc4m's fixes for AMD perf Co-authored-by: 0cc4m <picard12@live.de> --------- Co-authored-by: 0cc4m <picard12@live.de>

…#15015)

…15003) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

- Increase tile size for k-quants, to match non-k-quants - Choose more carefully between large and medium tiles, considering how it interacts with split_k - Allow larger/non-power of two split_k, and make the splits a multiple of 256 - Use split_k==3 to when >1/2 and <=2/3 of the SMs would hae been used

ddh0 and others added 17 commits August 1, 2025 23:48

initial PR commit

11648d5

add GGUF constants

69d1c58

initial GLM-4.5 integration

2586ae5

fix typo LLM_ATCH_GLM4_MOE --> LLM_ARCH_GLM4_MOE

2c6e198

add glm4_moe tensor mapping

dbe9f10

add attn_k_norm and attn_q_norm tensors for GLM-4.5

5f9e4e1

server: enable token array inputs for OAI API (ggml-org#15001)

f906275

model : support Qwen3-Embedding (ggml-org#15023)

339bd02

vulkan: Support ne[3]>1 in noncontig matrix-vector multiply (ggml-org…

ec0b188

…#15015)

more consistent organization

41169a8

more consistent organization (cont.)

3cf2e4a

Merge branch 'ggml-org:master' into glm45

2232baa

llama-bench: rename DB table name from test to llama_bench (ggml-org#…

3025b62

…15003) Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>

chat : fix multiple tool_calls on hermes-2-pro (ggml-org#14962)

f738989

Merge branch 'ggml-org:master' into glm45

da39c79

github-actions Bot added the testing label Aug 2, 2025

Nexesenex merged commit 86aa78c into Nexesenex:glm45 Aug 2, 2025
47 of 49 checks passed

github-actions Bot added examples python server ggml Vulkan script labels Aug 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Glm45#336

Glm45#336
Nexesenex merged 17 commits into
Nexesenex:glm45from
ddh0:glm45

Nexesenex commented Aug 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

Nexesenex commented Aug 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants