Skip to content

Add SFC-based GEMM kernel#72

Open
jopperm wants to merge 6 commits into
mainfrom
triton-cpu-sfc-gemm
Open

Add SFC-based GEMM kernel#72
jopperm wants to merge 6 commits into
mainfrom
triton-cpu-sfc-gemm

Conversation

@jopperm
Copy link
Copy Markdown
Contributor

@jopperm jopperm commented May 25, 2026

Candidate implementation for a high-performance CPU kernel using space-filling curves.

@jopperm jopperm marked this pull request as ready for review May 27, 2026 17:47
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I'd throw it into a separate subdir. Otherwise, it'll get a bit lost among other 100 kernels.
Not sure yet how we'd want to organize such helpers so maybe backends/triton/cpu/utils for now?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This could also go to a subdir for better visibility but could also stay as is as it's a "helper" kernel. Up to you.

import torch
import torch.nn as nn

sys.path.insert(0, str(Path(__file__).parent))
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

# Data is blocked into contiguous chunks of memory. Neighboring blocks in the K
# dimension will also be neighboring in memory.
@triton.jit
def _block_transpose_kernel(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it'd call it ...block_pack... to consistent with the whole block packing naming we use elsewhere


for ik in range(blocking_factor_k):
_sfc_matmul_kernel[(BLOCKS_M * BLOCKS_N,)](
a,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't A be block packed too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants