Skip to content

[xegpu] Add transpose A/B support in mlp schedule#184

Open
tkarna wants to merge 11 commits into
llvm:mainfrom
tkarna:xegpu-matmul-transpose
Open

[xegpu] Add transpose A/B support in mlp schedule#184
tkarna wants to merge 11 commits into
llvm:mainfrom
tkarna:xegpu-matmul-transpose

Conversation

@tkarna
Copy link
Copy Markdown
Contributor

@tkarna tkarna commented Jun 3, 2026

  • Updates mlp schedule to support transpose a and/or transpose b case.
  • inspect_payload returns metadata in generic "layers" nested dict. Currently returns metadata for matmul, batch_matmul and elemwise layers.
  • Matmul cost model and parameter selector are updated to handle transpose cases.
  • Cost model generate_configs has a verbose flag which prints tile selection info.
  • get_tileable_consumers returns only matmul epilog, i.e., excludes next linalg.matmul op.
  • matmul and mlp examples are updated to have --transpose-a/b options.

@tkarna tkarna requested review from adam-smnk and rengolin June 3, 2026 19:05
@tkarna tkarna force-pushed the xegpu-matmul-transpose branch 2 times, most recently from 87f27d8 to adc7607 Compare June 3, 2026 19:37
@tkarna tkarna force-pushed the xegpu-matmul-transpose branch from adc7607 to b8aadcb Compare June 4, 2026 08:10
@tkarna tkarna force-pushed the xegpu-matmul-transpose branch from b8aadcb to 1ebf952 Compare June 4, 2026 09:04
Comment thread lighthouse/utils/mlir.py Outdated
Comment thread lighthouse/utils/mlir.py
_, n = inputs[1].type.shape
matmuls.append((m, n, k))
input_is_transpose = [
has_producer(o, linalg.TransposeOp) for o in inputs
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be more generic to check matmul's indexing maps?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah it depends on how the IR is formulated. In KernelBench we get IR with explicit linag.transpose ops:

    %transposed = linalg.transpose ins(%0 : ...) outs(%2 : tensor<2048x8192xf16>) permutation = [1, 0] 
    %5 = linalg.matmul ins(%transposed, %1 : tensor<2048x8192xf16>, ...) -> tensor<2048x4096xf32>

which this approach detects. Matmul with transposed indexing map would be another variant, that pattern is currently not supported.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so the imported IR always has explicit transpose op before matmul.

Then maybe I'd rename the metadata field to be more specific to indicate that A/B has transpose producer and not implicit transpose through indexing maps.
Knowing these are separate operation vs performing C += A^T * B can be a meaningful difference.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, on vector level however the explicit transpose op will be gone (using the current xegpu lowering): there's one vector.contract with transposed index map:

#map = affine_map<(d0, d1, d2) -> (d2, d0)>
#map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
#map2 = affine_map<(d0, d1, d2) -> (d0, d1)>

        %8 = vector.transfer_read %1[%arg6, %arg3], %0 {in_bounds = [true, true]} : tensor<8192x2048xf16>, vector<32x256xf16>
        %9 = vector.transfer_read %2[%arg6, %arg4], %0 {in_bounds = [true, true]} : tensor<8192x4096xf16>, vector<32x256xf16>
        %10 = vector.contract {indexing_maps = [#map, #map1, #map2], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %8, %9, %arg7 : vector<32x256xf16>, vector<32x256xf16> into vector<256x256xf32>

As such the two alternatives, either explicit linalg.transpose producer or linalg.matmul with transposed indexing map, would AFAIK be identical here.

For xegpu lowering we essentially need to know if there's a transpose op in the A/B tile producer chain. As such, I'm not sure if we need to differentiate the two variants in the metadata. If such differentiation becomes necessary at some point we can add it then.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approaching it from a perspective of payload inspector as a standalone tool.

I agree that in this case it makes no difference for the current use case.
So, I'm not pushing to support the indexing map check too. Just removing ambiguity from the returned metadata should be enough to avoid future confusion when somebody uses the inspector without Xe pipeline assumptions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approaching it from a perspective of payload inspector as a standalone tool.
I agree that in this case it makes no difference for the current use case.

Yes, understand. I would put my "test driven development" hat on and keep things simple until we need to differentiate. We don't know if there will be use cases where differentiating the transpose variants matters. We also don't know if payload inspector can be generalized - xegpu case might need different kind of analysis than some other use case.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I'd consider moving this whole logic into some Xe-specific utils.
Anyway, not a blocker right now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the layer matching could be moved to xegpu land. payload_inspector has one other use case examples/mpi/feed-forward-mpi.py but it only uses the function arg shapes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants