fix: matmul_all_reduce lock signal must use scope="sys" for xGMI visibility by Copilot · Pull Request #477 · ROCm/iris

Copilot · 2026-03-24T00:51:57Z

Lock writes in _fused_matmul_all_reduce_kernel used scope="gpu", making them invisible to remote GPUs polling via system-scope atomics — causing intermittent hangs on multi-GPU runs.

Change

iris/ops/matmul_all_reduce.py: Change lock-signal atomic from scope="gpu" to scope="sys"

# Before — write confined to local GPU cache hierarchy
tl.atomic_xchg(lock_ptr, 1, sem="release", scope="gpu")

# After — propagates through xGMI system-level coherence
tl.atomic_xchg(lock_ptr, 1, sem="release", scope="sys")

The lock array lives on the symmetric heap and is mapped into all GPUs' address spaces via IPC. Any atomic on symmetric heap memory consumed by a remote rank must use scope="sys".

Original prompt

This section details on the original issue you should resolve

<issue_title>matmul_all_reduce: lock writes use scope="gpu", invisible to remote GPUs</issue_title>
<issue_description>## Bug

In iris/ops/matmul_all_reduce.py, the lock-signal atomic in _fused_matmul_all_reduce_kernel uses tl.atomic_xchg(lock_ptr, value, sem="release") which defaults to scope="gpu". The lock array lives on the iris symmetric heap, mapped into all GPUs' address spaces via IPC. When Rank 0 writes to its lock entry, Rank 1 polls via iris.atomic_add(..., scope="sys"). Because the write uses scope="gpu", the store may only be visible within Rank 0's GPU caches and never propagate through the system-level coherence protocol (xGMI).

Impact

Remote ranks spin indefinitely on locks — manifests as intermittent hangs on multi-GPU runs. Non-deterministic and hard to diagnose.

Fix

Change the lock signal to tl.atomic_xchg(lock_ptr, value, sem="release", scope="sys"). All atomic operations on symmetric heap memory that cross GPU boundaries must use scope="sys".

Component

iris/ops/matmul_all_reduce.py</issue_description>

<agent_instructions>Agreed with this bug. Consumer can be remote. Need system scope.</agent_instructions>

Comments on the Issue (you are @copilot in this section)

Fixes matmul_all_reduce: lock writes use scope="gpu", invisible to remote GPUs #462

⌨️ Start Copilot coding agent tasks without leaving your editor — available in VS Code, Visual Studio, JetBrains IDEs and Eclipse.

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com> Agent-Logs-Url: https://github.com/ROCm/iris/sessions/09bd515d-f545-480e-b31c-040637ee3e8c

Initial plan

e336708

Copilot AI assigned Copilot and mawad-amd Mar 24, 2026

Copilot started work on behalf of mawad-amd March 24, 2026 00:52 View session

fix: use scope="sys" for lock signal atomic in matmul_all_reduce kernel

cc8cbcb

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com> Agent-Logs-Url: https://github.com/ROCm/iris/sessions/09bd515d-f545-480e-b31c-040637ee3e8c

Copilot AI changed the title ~~[WIP] Fix lock writes to use scope="sys" for matmul_all_reduce~~ fix: matmul_all_reduce lock signal must use scope="sys" for xGMI visibility Mar 24, 2026

Copilot AI requested a review from mawad-amd March 24, 2026 01:01

Copilot finished work on behalf of mawad-amd March 24, 2026 01:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: matmul_all_reduce lock signal must use scope="sys" for xGMI visibility#477

fix: matmul_all_reduce lock signal must use scope="sys" for xGMI visibility#477
Copilot wants to merge 2 commits intomainfrom
copilot/fix-lock-writes-scope-gpu

Copilot AI commented Mar 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Change

Impact

Fix

Component

Comments on the Issue (you are @copilot in this section)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Mar 24, 2026 •

edited

Loading