[LLVM integrate] Failing load_to_lds tests after llvm-project@3e1e86ef #23401
Closed
Muzammiluddin-Syed-ECE
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Commit: llvm/llvm-project@3e1e86e
This is a part of the codebase I am unfamiliar with. I have tried to edit an AI generated summary of the issues we are seeing to make it more readable.
AMDGPU Multi-Memory-Operand Assertion Fix
Summary
An IREE e2e test fails when compiling with an LLVM commit that gives certain AMDGPU intrinsics two memory operands (MMOs) instead of one. Code throughout the compiler assumed a single MMO and called
getMemOperand(), which asserts when multiple MMOs are present. This document describes the failing test, the triggering commit, the fixes applied, and recommended next steps.**see potential fix at: ** llvm/llvm-project#180027
Failing Test
e2e_matmul_cdna4_mxfp4_dt_tensor_ukernel_medium_rocm_hip_matmultests/e2e/matmul/e2e_matmul_cdna4_mxfp4_dt_tensor_ukernel_medium_rocm_hip_matmul.vmfbninja iree-test-deps(or building that specific.vmfb)iree-compileaborts with an assertion.Assertion:
Stack (abbreviated):
SelectionDAG::Combine→SelectionDAGISel::CodeGenAndEmitDAG→ … →AMDGPU DAG->DAG Pattern Instruction SelectionSo the failure happens during AMDGPU SelectionDAG instruction selection, inside the DAG Combine phase, when some code calls
getMemOperand()on a node that has two MMOs.Intrinsics That Cause the Failure
The failure is tied to load-to-LDS and store-from-LDS style intrinsics that perform both a load and a store in one instruction. After the LLVM commit below, these are represented with two
MachineMemOperands (one for the load, one for the store).Relevant intrinsics (from the commit and
getTgtMemIntrinsicin AMDGPU):amdgcn_load_to_lds/amdgcn_global_load_lds— load from global/flat, store to LDSamdgcn_struct_buffer_load_lds/amdgcn_struct_ptr_buffer_load_lds— buffer load to LDSamdgcn_cluster_load_async_to_lds_*— async load to LDSamdgcn_global_store_async_from_lds_*— async store from LDS to globalThe mxfp4 matmul pipeline (data-tiling, tensor ukernels, ROCm) ends up emitting at least one of these. The resulting MemIntrinsicSDNode in the SelectionDAG has two MMOs; any code that calls
getMemOperand()on it hits the assertion.Triggering LLVM Commit
3e1e86ef1fb8973f90cde376d5ad2d79ec7f52d9llvm/lib/Target/AMDGPU/SIISelLowering.cpp(+ test/expectation updates)What the commit does:
getTgtMemIntrinsicFor the intrinsics listed above, it now pushes two entries into the
IntrinsicInfolist (two MMOs) instead of one:MOLoad, appropriatememVT, pointer, offset.MOStore, possibly differentmemVT(e.g. wider to model per-lane offset), and LDS pointer.Lowering
In
LowerINTRINSIC_VOID, the code that creates the machine node for these intrinsics is updated to useM->memoperands()andDAG.setNodeMemRefs(Load, M->memoperands())instead of manually building two MMOs from a singlegetMemOperand().So: the intrinsic SDNode (and, after lowering, the MachineSDNode) can now carry two MMOs. The rest of the backend and the DAG combiner were written when only one MMO existed; many call sites still call
getMemOperand()and assert.Fixes Recommended
This failure was resolved upon applying these fixes in the IREE-vendored
third_party/llvm-project(same structure as upstream LLVM).1. Guard call sites that assume a single MMO
llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cppIn
shouldReduceLoadWidth, if!MN->hasUniqueMemOperand(), return early (using(OldSize < 32)) and never callgetMemOperand()orgetAlign().llvm/lib/Target/AMDGPU/SIISelLowering.cppIn
isMemOpHasNoClobberedMemOperand, if!MemNode->hasUniqueMemOperand()returnfalse(conservative: do not claim “no clobber”), then callgetMemOperand()->getFlags()only when there is a unique MMO.llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cppisLegalNarrowLdSt, if!LDST->hasUniqueMemOperand()returnfalseso we never narrow a multi-MMO node.visitVPOp, only usewriteMem()/ the “replace with chain or chain+undef” logic whenMemSD->hasUniqueMemOperand(); otherwise skip that block (no incorrect load-only vs store-only classification for multi-MMO nodes).2.
getMemOperand()behavior when multiple MMOs exist (workaround)llvm/include/llvm/CodeGen/SelectionDAGNodes.hgetMemOperand()was changed from asserting whenMemRefsis an array (multiple MMOs) to returning the first MMO (Array[0]). The comment was updated to state that callers should prefermemoperands()for multi-MMO nodes.This avoids the assertion everywhere that still calls
getMemOperand()without a priorhasUniqueMemOperand()check, but does not fix the semantics for those call sites (see “Effectiveness” below).Effectiveness of the Fix
Pragmatic effectiveness:
The combination of (1) guarding the known call sites and (2) returning the first MMO in
getMemOperand()stops the crash and allowsninja iree-test-deps(and the mxfp4 ROCm matmul test) to complete.Semantic correctness:
Returning “the first MMO” in
getMemOperand()is not a correct general solution:writeMem()→getMemOperand()->isStore(). For load-to-lds, the first MMO is the load; so the node is treated as “load only” and replaced with chain+undef, even though it also stores to LDS. That can remove a store and be wrong. (We mitigated this by only running that logic whenhasUniqueMemOperand()is true.)setNodeMemRefs(Selected, {MMO})with a singlegetMemOperand()would attach only one MMO and drop the other, misrepresenting the instruction for alias/scheduling.falsewhen there isn’t a unique MMO.So: the guards (only use single-MMO APIs when
hasUniqueMemOperand()) are the correct part; the “return first MMO” ingetMemOperand()is a compatibility escape so that any remaining, unknown call sites do not assert, but they may still be wrong for multi-MMO nodes.Conclusion:
The fix is effective at unblocking the test and the build and is reasonably safe where we explicitly guard (DAGCombiner, AMDGPU lowering/selection). It is not a complete or “correct” fix for all possible call sites; the proper long-term approach is to audit every
getMemOperand()(andgetPointerInfo(),getAlign(),readMem(),writeMem()) use and either restrict to single-MMO nodes viahasUniqueMemOperand()or usememoperands()and the right predicate (e.g. “any store?”, “all no-clobber?”).Other Relevant Details
MemSDNode API:
getMemOperand()— returns oneMachineMemOperand*; historically asserted when the node had multiple MMOs.memoperands()— returnsArrayRef<MachineMemOperand*>for all MMOs; use this for multi-MMO nodes.hasUniqueMemOperand()— true iff the node has exactly one MMO; use before calling single-MMO APIs.Where the assumption “single MMO” appears:
Any call to
getMemOperand(), or to helpers that use it (getPointerInfo(),getAlign(),getBaseAlign(),readMem(),writeMem()), without a prior check for multiple MMOs (or without usingmemoperands()), is such an assumption. Such call sites exist in:DAGCombiner.cpp(many load/store merge and combine paths),AMDGPUISelLowering.cpp,SIISelLowering.cpp,AMDGPUISelDAGToDAG.cpp, andSelectionDAGNodes.h(the helpers above).Why two MMOs:
The intrinsics perform two distinct memory operations (e.g. load from A, store to B). Representing them with two MMOs improves alias analysis and scheduling; the commit is a correctness/quality improvement that exposed latent single-MMO assumptions.
References
3e1e86ef1fb8973f90cde376d5ad2d79ec7f52d9([AMDGPU] Return two MMOs for load-to-lds and store-from-lds intrinsics).SelectionDAGNodes.h(e.g. around line 1510),getMemOperand().memoperands(),hasUniqueMemOperand(),getNumMemOperands()in the same file.Beta Was this translation helpful? Give feedback.
All reactions