[xegpu] Add transpose A/B support in mlp schedule by tkarna · Pull Request #184 · llvm/lighthouse

tkarna · 2026-06-03T19:05:25Z

Updates mlp schedule to support transpose a and/or transpose b case.
inspect_payload returns metadata in generic "layers" nested dict. Currently returns metadata for matmul, batch_matmul and elemwise layers.
Matmul cost model and parameter selector are updated to handle transpose cases.
Cost model generate_configs has a verbose flag which prints tile selection info.
get_tileable_consumers returns only matmul epilog, i.e., excludes next linalg.matmul op.
matmul and mlp examples are updated to have --transpose-a/b options.

We need to split kernels between matmul ops. Could be added as an option.

adam-smnk · 2026-06-04T09:09:48Z

-                            _, n = inputs[1].type.shape
-                            matmuls.append((m, n, k))
+                            input_is_transpose = [
+                                has_producer(o, linalg.TransposeOp) for o in inputs


Wouldn't it be more generic to check matmul's indexing maps?

Hmm, yeah it depends on how the IR is formulated. In KernelBench we get IR with explicit linag.transpose ops:

%transposed = linalg.transpose ins(%0 : ...) outs(%2 : tensor<2048x8192xf16>) permutation = [1, 0] %5 = linalg.matmul ins(%transposed, %1 : tensor<2048x8192xf16>, ...) -> tensor<2048x4096xf32>

which this approach detects. Matmul with transposed indexing map would be another variant, that pattern is currently not supported.

I see, so the imported IR always has explicit transpose op before matmul.

Then maybe I'd rename the metadata field to be more specific to indicate that A/B has transpose producer and not implicit transpose through indexing maps.
Knowing these are separate operation vs performing C += A^T * B can be a meaningful difference.

Yeah, on vector level however the explicit transpose op will be gone (using the current xegpu lowering): there's one vector.contract with transposed index map:

#map = affine_map<(d0, d1, d2) -> (d2, d0)> #map1 = affine_map<(d0, d1, d2) -> (d2, d1)> #map2 = affine_map<(d0, d1, d2) -> (d0, d1)> %8 = vector.transfer_read %1[%arg6, %arg3], %0 {in_bounds = [true, true]} : tensor<8192x2048xf16>, vector<32x256xf16> %9 = vector.transfer_read %2[%arg6, %arg4], %0 {in_bounds = [true, true]} : tensor<8192x4096xf16>, vector<32x256xf16> %10 = vector.contract {indexing_maps = [#map, #map1, #map2], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %8, %9, %arg7 : vector<32x256xf16>, vector<32x256xf16> into vector<256x256xf32>

As such the two alternatives, either explicit linalg.transpose producer or linalg.matmul with transposed indexing map, would AFAIK be identical here.

For xegpu lowering we essentially need to know if there's a transpose op in the A/B tile producer chain. As such, I'm not sure if we need to differentiate the two variants in the metadata. If such differentiation becomes necessary at some point we can add it then.

I'm approaching it from a perspective of payload inspector as a standalone tool.

I agree that in this case it makes no difference for the current use case.
So, I'm not pushing to support the indexing map check too. Just removing ambiguity from the returned metadata should be enough to avoid future confusion when somebody uses the inspector without Xe pipeline assumptions.

I'm approaching it from a perspective of payload inspector as a standalone tool.
I agree that in this case it makes no difference for the current use case.

Yes, understand. I would put my "test driven development" hat on and keep things simple until we need to differentiate. We don't know if there will be use cases where differentiating the transpose variants matters. We also don't know if payload inspector can be generalized - xegpu case might need different kind of analysis than some other use case.

Then I'd consider moving this whole logic into some Xe-specific utils.
Anyway, not a blocker right now.

Yes, the layer matching could be moved to xegpu land. payload_inspector has one other use case examples/mpi/feed-forward-mpi.py but it only uses the function arg shapes.

tkarna added 3 commits June 3, 2026 21:32

inspect_payload: return generic layer properties dict

6630ee7

matmul costmodel: add transpose arg and update verbosity

224a70c

xegpu parameter selector: use dict input and support transposes

3d4f355

tkarna requested review from adam-smnk and rengolin June 3, 2026 19:05

tkarna force-pushed the xegpu-matmul-transpose branch 2 times, most recently from 87f27d8 to adc7607 Compare June 3, 2026 19:37

tkarna added 5 commits June 4, 2026 11:10

mlp_schedule: add support for a/b transpose

2ea4a66

matmul costmodel: use fixed [16, 16] load tile for transpose case

3827238

mlp_schedule: convert memref.alloc ops to gpu.alloc after bufferization

ed3ad2a

get_tileable_consumers: do not trace through linalg.Matmul ops

7e5e66f

We need to split kernels between matmul ops. Could be added as an option.

xegpu matmul example: add transpose a/b option

977d156

tkarna force-pushed the xegpu-matmul-transpose branch from adc7607 to b8aadcb Compare June 4, 2026 08:10

tkarna added 2 commits June 4, 2026 12:04

xegpu mlp example: add transpose a/b option

9027c81

fix enumerate_matmul_schedules

1ebf952

tkarna force-pushed the xegpu-matmul-transpose branch from b8aadcb to 1ebf952 Compare June 4, 2026 09:04

adam-smnk reviewed Jun 4, 2026

View reviewed changes

inspect_payload: simplify parallel iter check

deccf6e

adam-smnk approved these changes Jun 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[xegpu] Add transpose A/B support in mlp schedule#184

[xegpu] Add transpose A/B support in mlp schedule#184
tkarna wants to merge 11 commits into
llvm:mainfrom
tkarna:xegpu-matmul-transpose

tkarna commented Jun 3, 2026

Uh oh!

Uh oh!

adam-smnk Jun 4, 2026

Uh oh!

tkarna Jun 4, 2026

Uh oh!

adam-smnk Jun 4, 2026

Uh oh!

tkarna Jun 4, 2026

Uh oh!

adam-smnk Jun 4, 2026

Uh oh!

tkarna Jun 4, 2026

Uh oh!

adam-smnk Jun 4, 2026

Uh oh!

tkarna Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tkarna commented Jun 3, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants