Skip to content

Glm45#336

Merged
Nexesenex merged 17 commits into
Nexesenex:glm45from
ddh0:glm45
Aug 2, 2025
Merged

Glm45#336
Nexesenex merged 17 commits into
Nexesenex:glm45from
ddh0:glm45

Conversation

@Nexesenex

Copy link
Copy Markdown
Owner

No description provided.

ddh0 and others added 17 commits August 1, 2025 23:48
* vulkan: optimizations for direct convolution

- Empirically choose a better tile size. Reducing BS_K/BS_NPQ helps fill
  the GPU. The new size should be amenable to using coopmat, too.
- Fix shmem bank conflicts. 16B padding should work with coopmat.
- Some explicit loop unrolling.
- Skip math/stores work for parts of the tile that are OOB.
- Apply fastdiv opt.
- Disable shuffles for NV.

* Three tiles sizes for CONV_2D, and a heuristic to choose

* reallow collectives for pre-Turing

* make SHMEM_PAD a spec constant

* fixes for intel perf - no shmem padding, placeholder shader core count

* shader variants with/without unrolling

* 0cc4m's fixes for AMD perf

Co-authored-by: 0cc4m <picard12@live.de>

---------

Co-authored-by: 0cc4m <picard12@live.de>
…15003)

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
- Increase tile size for k-quants, to match non-k-quants
- Choose more carefully between large and medium tiles, considering how it
  interacts with split_k
- Allow larger/non-power of two split_k, and make the splits a multiple of 256
- Use split_k==3 to when >1/2 and <=2/3 of the SMs would hae been used
@Nexesenex Nexesenex merged commit 86aa78c into Nexesenex:glm45 Aug 2, 2025
47 of 49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants