[LLVM integrate] Failing load_to_lds tests after llvm-project@3e1e86ef #23401

Muzammiluddin-Syed-ECE · 2026-02-05T20:29:42Z

Muzammiluddin-Syed-ECE
Feb 5, 2026
Collaborator

Commit: llvm/llvm-project@3e1e86e

This is a part of the codebase I am unfamiliar with. I have tried to edit an AI generated summary of the issues we are seeing to make it more readable.

AMDGPU Multi-Memory-Operand Assertion Fix

Summary

An IREE e2e test fails when compiling with an LLVM commit that gives certain AMDGPU intrinsics two memory operands (MMOs) instead of one. Code throughout the compiler assumed a single MMO and called getMemOperand(), which asserts when multiple MMOs are present. This document describes the failing test, the triggering commit, the fixes applied, and recommended next steps.

**see potential fix at: ** llvm/llvm-project#180027

Failing Test

Test: e2e_matmul_cdna4_mxfp4_dt_tensor_ukernel_medium_rocm_hip_matmul
Path: tests/e2e/matmul/e2e_matmul_cdna4_mxfp4_dt_tensor_ukernel_medium_rocm_hip_matmul.vmfb
Build target: ninja iree-test-deps (or building that specific .vmfb)
Failure mode: iree-compile aborts with an assertion.

Assertion:

iree-compile: .../llvm/include/llvm/CodeGen/SelectionDAGNodes.h:1511: 
llvm::MachineMemOperand *llvm::MemSDNode::getMemOperand() const: 
Assertion `!isa<MachineMemOperand **>(MemRefs) && "Use memoperands() for nodes with multiple memory operands"' failed.

Stack (abbreviated):

SelectionDAG::Combine → SelectionDAGISel::CodeGenAndEmitDAG → … → AMDGPU DAG->DAG Pattern Instruction Selection
Failing function: one that does matmul with mxfp4 (scaled matmul) for ROCm/HIP on CDNA4 (gfx950).

So the failure happens during AMDGPU SelectionDAG instruction selection, inside the DAG Combine phase, when some code calls getMemOperand() on a node that has two MMOs.

Intrinsics That Cause the Failure

The failure is tied to load-to-LDS and store-from-LDS style intrinsics that perform both a load and a store in one instruction. After the LLVM commit below, these are represented with two MachineMemOperands (one for the load, one for the store).

Relevant intrinsics (from the commit and getTgtMemIntrinsic in AMDGPU):

amdgcn_load_to_lds / amdgcn_global_load_lds — load from global/flat, store to LDS
amdgcn_struct_buffer_load_lds / amdgcn_struct_ptr_buffer_load_lds — buffer load to LDS
amdgcn_cluster_load_async_to_lds_* — async load to LDS
amdgcn_global_store_async_from_lds_* — async store from LDS to global

The mxfp4 matmul pipeline (data-tiling, tensor ukernels, ROCm) ends up emitting at least one of these. The resulting MemIntrinsicSDNode in the SelectionDAG has two MMOs; any code that calls getMemOperand() on it hits the assertion.

Triggering LLVM Commit

Commit: 3e1e86ef1fb8973f90cde376d5ad2d79ec7f52d9
Title: [AMDGPU] Return two MMOs for load-to-lds and store-from-lds intrinsics (#175845)
Author: Nicolai Hähnle (AMD)
Files changed: llvm/lib/Target/AMDGPU/SIISelLowering.cpp (+ test/expectation updates)

What the commit does:

getTgtMemIntrinsic
For the intrinsics listed above, it now pushes two entries into the IntrinsicInfo list (two MMOs) instead of one:
- One MMO for the load (e.g. from buffer/global), with MOLoad, appropriate memVT, pointer, offset.
- One MMO for the store (e.g. to LDS), with MOStore, possibly different memVT (e.g. wider to model per-lane offset), and LDS pointer.
Lowering
In LowerINTRINSIC_VOID, the code that creates the machine node for these intrinsics is updated to use M->memoperands() and DAG.setNodeMemRefs(Load, M->memoperands()) instead of manually building two MMOs from a single getMemOperand().

So: the intrinsic SDNode (and, after lowering, the MachineSDNode) can now carry two MMOs. The rest of the backend and the DAG combiner were written when only one MMO existed; many call sites still call getMemOperand() and assert.

Fixes Recommended

This failure was resolved upon applying these fixes in the IREE-vendored third_party/llvm-project (same structure as upstream LLVM).

1. Guard call sites that assume a single MMO

llvm/lib/Target/AMDGPU/AMDGPUISelLowering.cpp
In shouldReduceLoadWidth, if !MN->hasUniqueMemOperand(), return early (using (OldSize < 32)) and never call getMemOperand() or getAlign().
llvm/lib/Target/AMDGPU/SIISelLowering.cpp
In isMemOpHasNoClobberedMemOperand, if !MemNode->hasUniqueMemOperand() return false (conservative: do not claim “no clobber”), then call getMemOperand()->getFlags() only when there is a unique MMO.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
- In isLegalNarrowLdSt, if !LDST->hasUniqueMemOperand() return false so we never narrow a multi-MMO node.
- In visitVPOp, only use writeMem() / the “replace with chain or chain+undef” logic when MemSD->hasUniqueMemOperand(); otherwise skip that block (no incorrect load-only vs store-only classification for multi-MMO nodes).

2. `getMemOperand()` behavior when multiple MMOs exist (workaround)

llvm/include/llvm/CodeGen/SelectionDAGNodes.h
getMemOperand() was changed from asserting when MemRefs is an array (multiple MMOs) to returning the first MMO (Array[0]). The comment was updated to state that callers should prefer memoperands() for multi-MMO nodes.

This avoids the assertion everywhere that still calls getMemOperand() without a prior hasUniqueMemOperand() check, but does not fix the semantics for those call sites (see “Effectiveness” below).

Effectiveness of the Fix

Pragmatic effectiveness:
The combination of (1) guarding the known call sites and (2) returning the first MMO in getMemOperand() stops the crash and allows ninja iree-test-deps (and the mxfp4 ROCm matmul test) to complete.
Semantic correctness:
Returning “the first MMO” in getMemOperand() is not a correct general solution:
- visitVPOp: It uses writeMem() → getMemOperand()->isStore(). For load-to-lds, the first MMO is the load; so the node is treated as “load only” and replaced with chain+undef, even though it also stores to LDS. That can remove a store and be wrong. (We mitigated this by only running that logic when hasUniqueMemOperand() is true.)
- Select intrinsic code* that does setNodeMemRefs(Selected, {MMO}) with a single getMemOperand() would attach only one MMO and drop the other, misrepresenting the instruction for alias/scheduling.
- isMemOpHasNoClobberedMemOperand with “first MMO only” can answer “no clobber” when the second MMO does clobber; we avoided that by returning false when there isn’t a unique MMO.
So: the guards (only use single-MMO APIs when hasUniqueMemOperand()) are the correct part; the “return first MMO” in getMemOperand() is a compatibility escape so that any remaining, unknown call sites do not assert, but they may still be wrong for multi-MMO nodes.
Conclusion:
The fix is effective at unblocking the test and the build and is reasonably safe where we explicitly guard (DAGCombiner, AMDGPU lowering/selection). It is not a complete or “correct” fix for all possible call sites; the proper long-term approach is to audit every getMemOperand() (and getPointerInfo(), getAlign(), readMem(), writeMem()) use and either restrict to single-MMO nodes via hasUniqueMemOperand() or use memoperands() and the right predicate (e.g. “any store?”, “all no-clobber?”).

Other Relevant Details

MemSDNode API:
- getMemOperand() — returns one MachineMemOperand*; historically asserted when the node had multiple MMOs.
- memoperands() — returns ArrayRef<MachineMemOperand*> for all MMOs; use this for multi-MMO nodes.
- hasUniqueMemOperand() — true iff the node has exactly one MMO; use before calling single-MMO APIs.
Where the assumption “single MMO” appears:
Any call to getMemOperand(), or to helpers that use it (getPointerInfo(), getAlign(), getBaseAlign(), readMem(), writeMem()), without a prior check for multiple MMOs (or without using memoperands()), is such an assumption. Such call sites exist in:
- DAGCombiner.cpp (many load/store merge and combine paths),
- AMDGPUISelLowering.cpp, SIISelLowering.cpp, AMDGPUISelDAGToDAG.cpp, and
- SelectionDAGNodes.h (the helpers above).
Why two MMOs:
The intrinsics perform two distinct memory operations (e.g. load from A, store to B). Representing them with two MMOs improves alias analysis and scheduling; the commit is a correctness/quality improvement that exposed latent single-MMO assumptions.

References

LLVM commit: 3e1e86ef1fb8973f90cde376d5ad2d79ec7f52d9 ([AMDGPU] Return two MMOs for load-to-lds and store-from-lds intrinsics).
Assertion: SelectionDAGNodes.h (e.g. around line 1510), getMemOperand().
API: memoperands(), hasUniqueMemOperand(), getNumMemOperands() in the same file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLVM integrate] Failing load_to_lds tests after llvm-project@3e1e86ef #23401

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[LLVM integrate] Failing load_to_lds tests after llvm-project@3e1e86ef #23401

Uh oh!

Muzammiluddin-Syed-ECE Feb 5, 2026 Collaborator

AMDGPU Multi-Memory-Operand Assertion Fix

Summary

Failing Test

Intrinsics That Cause the Failure

Triggering LLVM Commit

Fixes Recommended

1. Guard call sites that assume a single MMO

2. getMemOperand() behavior when multiple MMOs exist (workaround)

Effectiveness of the Fix

Other Relevant Details

References

Replies: 0 comments

Muzammiluddin-Syed-ECE
Feb 5, 2026
Collaborator

2. `getMemOperand()` behavior when multiple MMOs exist (workaround)