[TritonCPU] Prepare L1 matmul and elementwise kernels for benchmarking by jopperm · Pull Request #78 · libxsmm/AI-bench

jopperm · 2026-06-01T14:39:33Z

Derived bench-cpu specs from bench-gpu (if they existed) or from the original KB config.
Set block sizes in matmul kernels to AMX-friendly 32x32x32.
Set assume_in_bounds kernel parameter to override tensor descriptor bounds checking where possible.
Explore more block sizes for elementwise kernels.
Persistent ReLU and GeLU kernels didn't perform well in initial testing; manually rewrote them as normal kernels.

… kernels

sandlbn

LGTM !

jopperm added 4 commits June 1, 2026 07:26

Prep L1 matmuls for benchmarking

86a937a

Prep L1 elementwise kernels for benchmarking

ab9dceb

Fix dtype

851c4f9

Add mem_bytes tags, more blocksizes for eltwise, don't use persistent…

432d23c

… kernels

jopperm requested review from adam-smnk and sandlbn June 1, 2026 14:40

adam-smnk approved these changes Jun 1, 2026

View reviewed changes

sandlbn approved these changes Jun 1, 2026

View reviewed changes

sandlbn merged commit 32a16c4 into main Jun 1, 2026
6 checks passed

Provide feedback