Port cache_modifier, volatile, and other to DeviceContext and Gluon APIs by Copilot · Pull Request #471 · ROCm/iris

Copilot · 2026-03-22T02:50:10Z

DeviceContext and IrisDeviceCtx (Gluon) were missing cache_modifier, volatile, and other parameters that existed on the freestanding API (iris.load, iris.store, etc.) since #158.

DeviceContext (`iris/iris.py`)

Added missing parameters to all 5 methods, passed through to tl.load()/tl.store():

Method	New Parameters
`load`	`other`, `cache_modifier`, `volatile`
`store`	`cache_modifier`
`get`	`other`, `load_cache_modifier`, `store_cache_modifier`
`put`	`other`, `load_cache_modifier`, `store_cache_modifier`
`copy`	`other`, `load_cache_modifier`, `store_cache_modifier`

Gluon (`iris/experimental/iris_gluon.py`)

gl.load() and gl.store() support cache_modifier and volatile, so all 5 IrisDeviceCtx methods are updated with the same parameters. Defaults use None (consistent with the freestanding API and DeviceContext).

Usage

# DeviceContext
ctx.load(ptr + offsets, from_rank=1, mask=mask, cache_modifier=".cv", volatile=True)
ctx.store(ptr + offsets, data, to_rank=1, mask=mask, cache_modifier=".wt")
ctx.get(remote_ptr + offsets, local_ptr + offsets, from_rank=1, load_cache_modifier=".cg", store_cache_modifier=".cs")

# Gluon IrisDeviceCtx (same parameters)
ctx.load(ptr + offsets, 1, mask=mask, cache_modifier=".cv", volatile=True)

Tests

tests/unittests/test_device_context_cache_modifiers.py: mirrors the existing freestanding cache modifier tests for all 5 DeviceContext methods
tests/unittests/test_gluon_cache_modifiers.py: equivalent coverage for IrisDeviceCtx

Original prompt

This section details on the original issue you should resolve

<issue_title>Port cache modifiers, other, and volatile to DeviceContext and Gluon APIs</issue_title>
<issue_description>## Summary

PR #158 added cache_modifier, volatile, and other parameters to the freestanding Triton JIT API (iris.load, iris.store, iris.get, iris.put, iris.copy). The DeviceContext (ctx) and Gluon APIs need to be updated to match.

Current State

Parameter Freestanding DeviceContext (ctx) Gluon

load

other ✅ ❌ ✅

cache_modifier ✅ ❌ ❌

volatile ✅ ❌ ❌

store

cache_modifier ✅ ❌ ❌

get

other ✅ ❌ ✅

load_cache_modifier ✅ ❌ ❌

store_cache_modifier ✅ ❌ ❌

put

other ✅ ❌ ✅

load_cache_modifier ✅ ❌ ❌

store_cache_modifier ✅ ❌ ❌

copy

other ✅ ❌ ✅

load_cache_modifier ✅ ❌ ❌

store_cache_modifier ✅ ❌ ❌

What Needs to Be Done

1. DeviceContext (iris.py class DeviceContext, ~line 1526)

Add missing parameters to all 5 methods and pass them through to the underlying tl.load()/tl.store() calls:

load: Add other, cache_modifier, volatile → pass to tl.load(..., other=other, cache_modifier=cache_modifier, volatile=volatile)

store: Add cache_modifier → pass to tl.store(..., cache_modifier=cache_modifier)

get: Add other, load_cache_modifier, store_cache_modifier → pass load_cache_modifier to tl.load() as cache_modifier, pass store_cache_modifier to tl.store() as cache_modifier

put: Add other, load_cache_modifier, store_cache_modifier → same pattern as get

copy: Add other, load_cache_modifier, store_cache_modifier → same pattern

2. Gluon (iris/experimental/iris_gluon.py class IrisDeviceCtx, ~line 146)

Add cache modifier parameters to all 5 methods and pass them through to the underlying gl.load()/gl.store() calls:

load: Add cache_modifier, volatile → pass to gl.load(..., cache_modifier=cache_modifier, volatile=volatile)

store: Add cache_modifier → pass to gl.store(..., cache_modifier=cache_modifier)

get: Add load_cache_modifier, store_cache_modifier → pass to respective gl.load()/gl.store() calls

put: Add load_cache_modifier, store_cache_modifier → same pattern

copy: Add load_cache_modifier, store_cache_modifier → same pattern

Note: Check whether gl.load() and gl.store() support cache_modifier and volatile kwargs. If gluon doesn't support these yet, skip gluon and note it in the PR.

3. Tests

Add unit tests for the DeviceContext API mirroring the existing freestanding tests in tests/unittests/test_{load,store,get,put,copy}_cache_modifiers.py.

If gluon supports cache modifiers, add corresponding gluon tests as well.

Reference

Freestanding API signatures to match: iris/iris.py lines 1898–2227

Existing cache modifier tests to mirror: tests/unittests/test_*_cache_modifiers.py

PR Introduce cache modifiers #158 for full context on the original implementation</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes Port cache modifiers, other, and volatile to DeviceContext and Gluon APIs #470

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

- DeviceContext.load(): add other, cache_modifier, volatile params - DeviceContext.store(): add cache_modifier param - DeviceContext.get(): add other, load_cache_modifier, store_cache_modifier params - DeviceContext.put(): add other, load_cache_modifier, store_cache_modifier params - DeviceContext.copy(): add other, load_cache_modifier, store_cache_modifier params - IrisDeviceCtx.load(): add cache_modifier, volatile params - IrisDeviceCtx.store(): add cache_modifier param - IrisDeviceCtx.get(): add load_cache_modifier, store_cache_modifier params - IrisDeviceCtx.put(): add load_cache_modifier, store_cache_modifier params - IrisDeviceCtx.copy(): add load_cache_modifier, store_cache_modifier params - Add tests: test_device_context_cache_modifiers.py - Add tests: test_gluon_cache_modifiers.py Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com> Agent-Logs-Url: https://github.com/ROCm/iris/sessions/8ce4f35c-fe72-4506-89d2-e79687a16a09

mawad-amd · 2026-03-22T03:37:27Z

tests/unittests/test_device_context_cache_modifiers.py

+)
+def test_device_context_put_cache_modifiers_local(load_cache_modifier, store_cache_modifier):
+    """Test DeviceContext.put() local (from_rank == to_rank) with various cache modifiers."""
+    shmem = iris.iris(1 << 20)


@copilot use ctx instead of shmem all over the files you added or updated.

Done in 791e624. Replaced all shmem references with ctx in both test_device_context_cache_modifiers.py and test_gluon_cache_modifiers.py.

…t_gluon_cache_modifiers.py Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com> Agent-Logs-Url: https://github.com/ROCm/iris/sessions/ee387d87-c4e9-4207-b9d8-de35aba33605

mawad-amd · 2026-03-22T04:00:34Z

@copilot The docstrings for cache_modifier params in both DeviceContext and Gluon are too short. Expand them to match the format used in the freestanding API (see iris.load(), iris.store(), iris.copy(), iris.get(), iris.put() in iris/iris.py).

For example, you wrote:

cache_modifier: Controls cache behavior of the load. Supported values: None (default, same as ".ca"), ".ca", ".cg", ".cv".

But the freestanding API has the full expanded format:

cache_modifier (str, optional): Controls cache behavior of the load.

    Supported values:
        - None: *(default)* — Same as ".ca". Uses cache at all levels (CU, L2, LLC) with LRU policy.
        - ".ca": Cache at all levels (CU, L2, LLC) with LRU policy
        - ".cg": Bypasses the CU (L1) cache, streams through L2, and may hit in LLC but the line is not retained or inserted.
        - ".cv": Bypasses all GPU caches (CU and L2) and fetches directly from system memory. If data exists in the LLC, it may hit, but is not retained or inserted.
                Ensures global coherence by invalidating stale GPU cache lines.

And for store:

cache_modifier (str, optional): Controls cache behavior of the store. Supported values are:

    - None: *(default)* — Same as ".wb". Uses write-back caching at all levels (CU, L2, LLC) with LRU policy.
    - ".wb": Write-back. Write-allocate on L1 miss, inserted into caches and written back later.
    - ".cg": Cache Global. Equivalent to ".wb" — stored through L1 → L2 → LLC under LRU.
    - ".cs": Cache Streaming. Bypasses L1, streamed through L2, not retained in LLC.
    - ".wt": Write-Through. Bypasses L1 and L2 (coherent cache bypass), may hit in LLC with LRU.

Apply this expanded format to all cache_modifier, load_cache_modifier, and store_cache_modifier params across both DeviceContext (iris/iris.py) and Gluon (iris/experimental/iris_gluon.py). Copy the exact descriptions from the freestanding functions in iris/iris.py.

…to match freestanding API format Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com> Agent-Logs-Url: https://github.com/ROCm/iris/sessions/639da603-607c-4d81-b5af-88a77a38db97

Copilot · 2026-03-22T04:03:51Z

@copilot The docstrings for cache_modifier params in both DeviceContext and Gluon are too short. Expand them to match the format used in the freestanding API (see iris.load(), iris.store(), `iris....

Done in b46c8a7. Expanded all cache_modifier, load_cache_modifier, store_cache_modifier, and volatile docstrings in both DeviceContext (iris/iris.py) and IrisDeviceCtx (iris/experimental/iris_gluon.py) to match the full format used in the freestanding API, including per-value descriptions with the ISA-level behavior. Gluon uses "" as the default instead of None.

mawad-amd · 2026-03-22T04:46:36Z

@copilot The Gluon API uses "" (empty string) as default for cache_modifier, load_cache_modifier, and store_cache_modifier, but the freestanding API and DeviceContext both use None. Change all Gluon defaults from "" to None to be consistent. Update the Gluon docstrings accordingly (default should say None not "").

…with DeviceContext and freestanding API Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com> Agent-Logs-Url: https://github.com/ROCm/iris/sessions/5e77e456-b46f-4bef-b94a-8b8bfff9a473

Copilot · 2026-03-22T04:49:38Z

@copilot The Gluon API uses "" (empty string) as default for cache_modifier, load_cache_modifier, and store_cache_modifier, but the freestanding API and DeviceContext both use None. Change a...

Done in f5cfba2. Changed all Gluon cache_modifier, load_cache_modifier, and store_cache_modifier defaults from "" to None across all 5 IrisDeviceCtx methods, and updated the docstrings to say None instead of "".

Copilot

Pull request overview

Adds missing cache_modifier, volatile, and other parameters to DeviceContext and Gluon IrisDeviceCtx methods so they match the freestanding Iris API introduced in #158, and adds unit tests to validate the new passthrough behavior.

Changes:

Extend DeviceContext (load/store/get/put/copy) to accept and forward cache-modifier and masking-related parameters to tl.load/tl.store.
Extend Gluon IrisDeviceCtx methods to accept and forward cache_modifier/volatile where supported by gl.load/gl.store.
Add dedicated unit tests covering cache modifiers across all 5 methods for both APIs.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 27 comments.

File	Description
iris/iris.py	Extends `DeviceContext` method signatures and forwards args to Triton loads/stores.
iris/experimental/iris_gluon.py	Extends Gluon `IrisDeviceCtx` method signatures and forwards args to `gl.load`/`gl.store`.
tests/unittests/test_device_context_cache_modifiers.py	Adds coverage for `DeviceContext` cache modifiers / volatile across all 5 methods.
tests/unittests/test_gluon_cache_modifiers.py	Adds equivalent coverage for Gluon `IrisDeviceCtx` cache modifiers / volatile across all 5 methods.

Copilot · 2026-03-22T05:14:22Z

iris/iris.py

+        cache_modifier=None,
+        volatile=False,


tl.load(..., cache_modifier=..., volatile=...) expects compile-time constants in Triton; but cache_modifier and volatile are not marked tl.constexpr in this @triton.jit function signature. This can cause compilation failures when callers pass string/bool constants (which Triton generally requires as constexpr). Make cache_modifier: tl.constexpr = None and volatile: tl.constexpr = False (and similarly for the load/store cache modifier args in get/put/copy, and cache_modifier in store).

Suggested change

cache_modifier=None,

volatile=False,

cache_modifier: tl.constexpr = None,

volatile: tl.constexpr = False,

Copilot · 2026-03-22T05:14:22Z

iris/iris.py

        """
        translated_ptr = self._translate(pointer, self.rank, from_rank, hint)
-        result = tl.load(translated_ptr, mask=mask)
+        result = tl.load(translated_ptr, mask=mask, other=other, cache_modifier=cache_modifier, volatile=volatile)


tl.load(..., cache_modifier=..., volatile=...) expects compile-time constants in Triton; but cache_modifier and volatile are not marked tl.constexpr in this @triton.jit function signature. This can cause compilation failures when callers pass string/bool constants (which Triton generally requires as constexpr). Make cache_modifier: tl.constexpr = None and volatile: tl.constexpr = False (and similarly for the load/store cache modifier args in get/put/copy, and cache_modifier in store).

Copilot · 2026-03-22T05:14:22Z

iris/iris.py

+        other=None,
+        cache_modifier=None,
+        volatile=False,
+        hint: tl.constexpr = None,


The new parameters are inserted before hint, which changes the positional argument order compared to the previous API. Any existing callers passing hint positionally (rather than as a keyword) will now accidentally bind it to other/cache_modifier/etc. To avoid a breaking change, keep hint in its previous position (immediately after mask) and append new parameters after it, or make the new parameters keyword-only (if compatible with @triton.jit usage patterns in this codebase).

Suggested change

other=None,

cache_modifier=None,

volatile=False,

hint: tl.constexpr = None,

hint: tl.constexpr = None,

other=None,

cache_modifier=None,

volatile=False,

Copilot · 2026-03-22T05:14:22Z

iris/iris.py


    @triton.jit
-    def store(self, pointer, value, to_rank, mask=None, hint: tl.constexpr = None):
+    def store(self, pointer, value, to_rank, mask=None, cache_modifier=None, hint: tl.constexpr = None):


The new parameters are inserted before hint, which changes the positional argument order compared to the previous API. Any existing callers passing hint positionally (rather than as a keyword) will now accidentally bind it to other/cache_modifier/etc. To avoid a breaking change, keep hint in its previous position (immediately after mask) and append new parameters after it, or make the new parameters keyword-only (if compatible with @triton.jit usage patterns in this codebase).

Copilot · 2026-03-22T05:14:23Z

iris/iris.py

+    def get(
+        self,
+        from_ptr,
+        to_ptr,
+        from_rank,
+        mask=None,
+        other=None,
+        load_cache_modifier=None,
+        store_cache_modifier=None,
+        hint: tl.constexpr = None,
+    ):


The new parameters are inserted before hint, which changes the positional argument order compared to the previous API. Any existing callers passing hint positionally (rather than as a keyword) will now accidentally bind it to other/cache_modifier/etc. To avoid a breaking change, keep hint in its previous position (immediately after mask) and append new parameters after it, or make the new parameters keyword-only (if compatible with @triton.jit usage patterns in this codebase).

Copilot · 2026-03-22T05:14:26Z