Skip to content

[TritonCPU] Prepare L1 matmul and elementwise kernels for benchmarking#78

Merged
sandlbn merged 4 commits into
mainfrom
triton-cpu-level1-bench
Jun 1, 2026
Merged

[TritonCPU] Prepare L1 matmul and elementwise kernels for benchmarking#78
sandlbn merged 4 commits into
mainfrom
triton-cpu-level1-bench

Conversation

@jopperm
Copy link
Copy Markdown
Contributor

@jopperm jopperm commented Jun 1, 2026

  • Derived bench-cpu specs from bench-gpu (if they existed) or from the original KB config.
  • Set block sizes in matmul kernels to AMX-friendly 32x32x32.
  • Set assume_in_bounds kernel parameter to override tensor descriptor bounds checking where possible.
  • Explore more block sizes for elementwise kernels.
  • Persistent ReLU and GeLU kernels didn't perform well in initial testing; manually rewrote them as normal kernels.

@jopperm jopperm requested review from adam-smnk and sandlbn June 1, 2026 14:40
Copy link
Copy Markdown
Collaborator

@sandlbn sandlbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM !

@sandlbn sandlbn merged commit 32a16c4 into main Jun 1, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants