Skip to content

Port cache_modifier, volatile, and other to DeviceContext and Gluon APIs#471

Open
Copilot wants to merge 5 commits intomainfrom
copilot/port-cache-modifiers-to-devicecontext
Open

Port cache_modifier, volatile, and other to DeviceContext and Gluon APIs#471
Copilot wants to merge 5 commits intomainfrom
copilot/port-cache-modifiers-to-devicecontext

Conversation

Copy link
Contributor

Copilot AI commented Mar 22, 2026

DeviceContext and IrisDeviceCtx (Gluon) were missing cache_modifier, volatile, and other parameters that existed on the freestanding API (iris.load, iris.store, etc.) since #158.

DeviceContext (iris/iris.py)

Added missing parameters to all 5 methods, passed through to tl.load()/tl.store():

Method New Parameters
load other, cache_modifier, volatile
store cache_modifier
get other, load_cache_modifier, store_cache_modifier
put other, load_cache_modifier, store_cache_modifier
copy other, load_cache_modifier, store_cache_modifier

Gluon (iris/experimental/iris_gluon.py)

gl.load() and gl.store() support cache_modifier and volatile, so all 5 IrisDeviceCtx methods are updated with the same parameters. Defaults use None (consistent with the freestanding API and DeviceContext).

Usage

# DeviceContext
ctx.load(ptr + offsets, from_rank=1, mask=mask, cache_modifier=".cv", volatile=True)
ctx.store(ptr + offsets, data, to_rank=1, mask=mask, cache_modifier=".wt")
ctx.get(remote_ptr + offsets, local_ptr + offsets, from_rank=1, load_cache_modifier=".cg", store_cache_modifier=".cs")

# Gluon IrisDeviceCtx (same parameters)
ctx.load(ptr + offsets, 1, mask=mask, cache_modifier=".cv", volatile=True)

Tests

  • tests/unittests/test_device_context_cache_modifiers.py: mirrors the existing freestanding cache modifier tests for all 5 DeviceContext methods
  • tests/unittests/test_gluon_cache_modifiers.py: equivalent coverage for IrisDeviceCtx
Original prompt

This section details on the original issue you should resolve

<issue_title>Port cache modifiers, other, and volatile to DeviceContext and Gluon APIs</issue_title>
<issue_description>## Summary

PR #158 added cache_modifier, volatile, and other parameters to the freestanding Triton JIT API (iris.load, iris.store, iris.get, iris.put, iris.copy). The DeviceContext (ctx) and Gluon APIs need to be updated to match.

Current State

Parameter Freestanding DeviceContext (ctx) Gluon
load
other
cache_modifier
volatile
store
cache_modifier
get
other
load_cache_modifier
store_cache_modifier
put
other
load_cache_modifier
store_cache_modifier
copy
other
load_cache_modifier
store_cache_modifier

What Needs to Be Done

1. DeviceContext (iris.py class DeviceContext, ~line 1526)

Add missing parameters to all 5 methods and pass them through to the underlying tl.load()/tl.store() calls:

  • load: Add other, cache_modifier, volatile → pass to tl.load(..., other=other, cache_modifier=cache_modifier, volatile=volatile)
  • store: Add cache_modifier → pass to tl.store(..., cache_modifier=cache_modifier)
  • get: Add other, load_cache_modifier, store_cache_modifier → pass load_cache_modifier to tl.load() as cache_modifier, pass store_cache_modifier to tl.store() as cache_modifier
  • put: Add other, load_cache_modifier, store_cache_modifier → same pattern as get
  • copy: Add other, load_cache_modifier, store_cache_modifier → same pattern

2. Gluon (iris/experimental/iris_gluon.py class IrisDeviceCtx, ~line 146)

Add cache modifier parameters to all 5 methods and pass them through to the underlying gl.load()/gl.store() calls:

  • load: Add cache_modifier, volatile → pass to gl.load(..., cache_modifier=cache_modifier, volatile=volatile)
  • store: Add cache_modifier → pass to gl.store(..., cache_modifier=cache_modifier)
  • get: Add load_cache_modifier, store_cache_modifier → pass to respective gl.load()/gl.store() calls
  • put: Add load_cache_modifier, store_cache_modifier → same pattern
  • copy: Add load_cache_modifier, store_cache_modifier → same pattern

Note: Check whether gl.load() and gl.store() support cache_modifier and volatile kwargs. If gluon doesn't support these yet, skip gluon and note it in the PR.

3. Tests

  • Add unit tests for the DeviceContext API mirroring the existing freestanding tests in tests/unittests/test_{load,store,get,put,copy}_cache_modifiers.py.
  • If gluon supports cache modifiers, add corresponding gluon tests as well.

Reference

  • Freestanding API signatures to match: iris/iris.py lines 1898–2227
  • Existing cache modifier tests to mirror: tests/unittests/test_*_cache_modifiers.py
  • PR Introduce cache modifiers #158 for full context on the original implementation</issue_description>

Comments on the Issue (you are @copilot in this section)


🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

- DeviceContext.load(): add other, cache_modifier, volatile params
- DeviceContext.store(): add cache_modifier param
- DeviceContext.get(): add other, load_cache_modifier, store_cache_modifier params
- DeviceContext.put(): add other, load_cache_modifier, store_cache_modifier params
- DeviceContext.copy(): add other, load_cache_modifier, store_cache_modifier params
- IrisDeviceCtx.load(): add cache_modifier, volatile params
- IrisDeviceCtx.store(): add cache_modifier param
- IrisDeviceCtx.get(): add load_cache_modifier, store_cache_modifier params
- IrisDeviceCtx.put(): add load_cache_modifier, store_cache_modifier params
- IrisDeviceCtx.copy(): add load_cache_modifier, store_cache_modifier params
- Add tests: test_device_context_cache_modifiers.py
- Add tests: test_gluon_cache_modifiers.py

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Agent-Logs-Url: https://github.com/ROCm/iris/sessions/8ce4f35c-fe72-4506-89d2-e79687a16a09
Copilot AI changed the title [WIP] Port cache modifiers and volatile to DeviceContext and Gluon APIs Port cache_modifier, volatile, and other to DeviceContext and Gluon APIs Mar 22, 2026
Copilot AI requested a review from mawad-amd March 22, 2026 02:59
)
def test_device_context_put_cache_modifiers_local(load_cache_modifier, store_cache_modifier):
"""Test DeviceContext.put() local (from_rank == to_rank) with various cache modifiers."""
shmem = iris.iris(1 << 20)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot use ctx instead of shmem all over the files you added or updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 791e624. Replaced all shmem references with ctx in both test_device_context_cache_modifiers.py and test_gluon_cache_modifiers.py.

…t_gluon_cache_modifiers.py

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Agent-Logs-Url: https://github.com/ROCm/iris/sessions/ee387d87-c4e9-4207-b9d8-de35aba33605
Copilot AI requested a review from mawad-amd March 22, 2026 03:39
@mawad-amd
Copy link
Collaborator

@copilot The docstrings for cache_modifier params in both DeviceContext and Gluon are too short. Expand them to match the format used in the freestanding API (see iris.load(), iris.store(), iris.copy(), iris.get(), iris.put() in iris/iris.py).

For example, you wrote:

cache_modifier: Controls cache behavior of the load. Supported values: None (default, same as ".ca"), ".ca", ".cg", ".cv".

But the freestanding API has the full expanded format:

cache_modifier (str, optional): Controls cache behavior of the load.

    Supported values:
        - None: *(default)* — Same as ".ca". Uses cache at all levels (CU, L2, LLC) with LRU policy.
        - ".ca": Cache at all levels (CU, L2, LLC) with LRU policy
        - ".cg": Bypasses the CU (L1) cache, streams through L2, and may hit in LLC but the line is not retained or inserted.
        - ".cv": Bypasses all GPU caches (CU and L2) and fetches directly from system memory. If data exists in the LLC, it may hit, but is not retained or inserted.
                Ensures global coherence by invalidating stale GPU cache lines.

And for store:

cache_modifier (str, optional): Controls cache behavior of the store. Supported values are:

    - None: *(default)* — Same as ".wb". Uses write-back caching at all levels (CU, L2, LLC) with LRU policy.
    - ".wb": Write-back. Write-allocate on L1 miss, inserted into caches and written back later.
    - ".cg": Cache Global. Equivalent to ".wb" — stored through L1 → L2 → LLC under LRU.
    - ".cs": Cache Streaming. Bypasses L1, streamed through L2, not retained in LLC.
    - ".wt": Write-Through. Bypasses L1 and L2 (coherent cache bypass), may hit in LLC with LRU.

Apply this expanded format to all cache_modifier, load_cache_modifier, and store_cache_modifier params across both DeviceContext (iris/iris.py) and Gluon (iris/experimental/iris_gluon.py). Copy the exact descriptions from the freestanding functions in iris/iris.py.

…to match freestanding API format

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Agent-Logs-Url: https://github.com/ROCm/iris/sessions/639da603-607c-4d81-b5af-88a77a38db97
Copy link
Contributor Author

Copilot AI commented Mar 22, 2026

@copilot The docstrings for cache_modifier params in both DeviceContext and Gluon are too short. Expand them to match the format used in the freestanding API (see iris.load(), iris.store(), `iris....

Done in b46c8a7. Expanded all cache_modifier, load_cache_modifier, store_cache_modifier, and volatile docstrings in both DeviceContext (iris/iris.py) and IrisDeviceCtx (iris/experimental/iris_gluon.py) to match the full format used in the freestanding API, including per-value descriptions with the ISA-level behavior. Gluon uses "" as the default instead of None.

@mawad-amd
Copy link
Collaborator

@copilot The Gluon API uses "" (empty string) as default for cache_modifier, load_cache_modifier, and store_cache_modifier, but the freestanding API and DeviceContext both use None. Change all Gluon defaults from "" to None to be consistent. Update the Gluon docstrings accordingly (default should say None not "").

…with DeviceContext and freestanding API

Co-authored-by: mawad-amd <112003944+mawad-amd@users.noreply.github.com>
Agent-Logs-Url: https://github.com/ROCm/iris/sessions/5e77e456-b46f-4bef-b94a-8b8bfff9a473
Copy link
Contributor Author

Copilot AI commented Mar 22, 2026

@copilot The Gluon API uses "" (empty string) as default for cache_modifier, load_cache_modifier, and store_cache_modifier, but the freestanding API and DeviceContext both use None. Change a...

Done in f5cfba2. Changed all Gluon cache_modifier, load_cache_modifier, and store_cache_modifier defaults from "" to None across all 5 IrisDeviceCtx methods, and updated the docstrings to say None instead of "".

@mawad-amd mawad-amd marked this pull request as ready for review March 22, 2026 05:05
@mawad-amd mawad-amd requested review from BKP and neoblizz as code owners March 22, 2026 05:05
Copilot AI review requested due to automatic review settings March 22, 2026 05:05
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds missing cache_modifier, volatile, and other parameters to DeviceContext and Gluon IrisDeviceCtx methods so they match the freestanding Iris API introduced in #158, and adds unit tests to validate the new passthrough behavior.

Changes:

  • Extend DeviceContext (load/store/get/put/copy) to accept and forward cache-modifier and masking-related parameters to tl.load/tl.store.
  • Extend Gluon IrisDeviceCtx methods to accept and forward cache_modifier/volatile where supported by gl.load/gl.store.
  • Add dedicated unit tests covering cache modifiers across all 5 methods for both APIs.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 27 comments.

File Description
iris/iris.py Extends DeviceContext method signatures and forwards args to Triton loads/stores.
iris/experimental/iris_gluon.py Extends Gluon IrisDeviceCtx method signatures and forwards args to gl.load/gl.store.
tests/unittests/test_device_context_cache_modifiers.py Adds coverage for DeviceContext cache modifiers / volatile across all 5 methods.
tests/unittests/test_gluon_cache_modifiers.py Adds equivalent coverage for Gluon IrisDeviceCtx cache modifiers / volatile across all 5 methods.

Comment on lines +1532 to +1533
cache_modifier=None,
volatile=False,
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl.load(..., cache_modifier=..., volatile=...) expects compile-time constants in Triton; but cache_modifier and volatile are not marked tl.constexpr in this @triton.jit function signature. This can cause compilation failures when callers pass string/bool constants (which Triton generally requires as constexpr). Make cache_modifier: tl.constexpr = None and volatile: tl.constexpr = False (and similarly for the load/store cache modifier args in get/put/copy, and cache_modifier in store).

Suggested change
cache_modifier=None,
volatile=False,
cache_modifier: tl.constexpr = None,
volatile: tl.constexpr = False,

Copilot uses AI. Check for mistakes.
"""
translated_ptr = self._translate(pointer, self.rank, from_rank, hint)
result = tl.load(translated_ptr, mask=mask)
result = tl.load(translated_ptr, mask=mask, other=other, cache_modifier=cache_modifier, volatile=volatile)
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl.load(..., cache_modifier=..., volatile=...) expects compile-time constants in Triton; but cache_modifier and volatile are not marked tl.constexpr in this @triton.jit function signature. This can cause compilation failures when callers pass string/bool constants (which Triton generally requires as constexpr). Make cache_modifier: tl.constexpr = None and volatile: tl.constexpr = False (and similarly for the load/store cache modifier args in get/put/copy, and cache_modifier in store).

Copilot uses AI. Check for mistakes.
Comment on lines +1531 to +1534
other=None,
cache_modifier=None,
volatile=False,
hint: tl.constexpr = None,
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new parameters are inserted before hint, which changes the positional argument order compared to the previous API. Any existing callers passing hint positionally (rather than as a keyword) will now accidentally bind it to other/cache_modifier/etc. To avoid a breaking change, keep hint in its previous position (immediately after mask) and append new parameters after it, or make the new parameters keyword-only (if compatible with @triton.jit usage patterns in this codebase).

Suggested change
other=None,
cache_modifier=None,
volatile=False,
hint: tl.constexpr = None,
hint: tl.constexpr = None,
other=None,
cache_modifier=None,
volatile=False,

Copilot uses AI. Check for mistakes.

@triton.jit
def store(self, pointer, value, to_rank, mask=None, hint: tl.constexpr = None):
def store(self, pointer, value, to_rank, mask=None, cache_modifier=None, hint: tl.constexpr = None):
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new parameters are inserted before hint, which changes the positional argument order compared to the previous API. Any existing callers passing hint positionally (rather than as a keyword) will now accidentally bind it to other/cache_modifier/etc. To avoid a breaking change, keep hint in its previous position (immediately after mask) and append new parameters after it, or make the new parameters keyword-only (if compatible with @triton.jit usage patterns in this codebase).

Copilot uses AI. Check for mistakes.
Comment on lines +1605 to +1615
def get(
self,
from_ptr,
to_ptr,
from_rank,
mask=None,
other=None,
load_cache_modifier=None,
store_cache_modifier=None,
hint: tl.constexpr = None,
):
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new parameters are inserted before hint, which changes the positional argument order compared to the previous API. Any existing callers passing hint positionally (rather than as a keyword) will now accidentally bind it to other/cache_modifier/etc. To avoid a breaking change, keep hint in its previous position (immediately after mask) and append new parameters after it, or make the new parameters keyword-only (if compatible with @triton.jit usage patterns in this codebase).

Copilot uses AI. Check for mistakes.
Comment on lines +303 to +304
@pytest.mark.parametrize(
"load_cache_modifier,store_cache_modifier", list(product(LOAD_CACHE_MODIFIERS, STORE_CACHE_MODIFIERS))
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Cartesian-product parameterization across modifiers creates a large number of JIT-compiled test cases (especially for put and both copy directions), which can significantly increase CI time. If runtime becomes an issue, consider sampling representative modifier pairs (e.g., None, one mid-level like .cg, and one extreme like .cv/.wt) or marking the full matrix as a slower test tier.

Suggested change
@pytest.mark.parametrize(
"load_cache_modifier,store_cache_modifier", list(product(LOAD_CACHE_MODIFIERS, STORE_CACHE_MODIFIERS))
def _sample_cache_modifiers(modifiers):
"""Select a small representative subset of cache modifiers to limit test explosion."""
if len(modifiers) <= 3:
return modifiers
mid = len(modifiers) // 2
indices = {0, mid, len(modifiers) - 1}
return [modifiers[i] for i in sorted(indices)]
_SAMPLED_LOAD_CACHE_MODIFIERS = _sample_cache_modifiers(LOAD_CACHE_MODIFIERS)
_SAMPLED_STORE_CACHE_MODIFIERS = _sample_cache_modifiers(STORE_CACHE_MODIFIERS)
_SAMPLED_GET_CACHE_MODIFIER_PAIRS = list(
product(_SAMPLED_LOAD_CACHE_MODIFIERS, _SAMPLED_STORE_CACHE_MODIFIERS)
)
@pytest.mark.parametrize(
"load_cache_modifier,store_cache_modifier", _SAMPLED_GET_CACHE_MODIFIER_PAIRS

Copilot uses AI. Check for mistakes.
Comment on lines +354 to +356
@pytest.mark.parametrize(
"load_cache_modifier,store_cache_modifier", list(product(LOAD_CACHE_MODIFIERS, STORE_CACHE_MODIFIERS))
)
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Cartesian-product parameterization across modifiers creates a large number of JIT-compiled test cases (especially for put and both copy directions), which can significantly increase CI time. If runtime becomes an issue, consider sampling representative modifier pairs (e.g., None, one mid-level like .cg, and one extreme like .cv/.wt) or marking the full matrix as a slower test tier.

Copilot uses AI. Check for mistakes.
Comment on lines +401 to +403
@pytest.mark.parametrize(
"load_cache_modifier,store_cache_modifier", list(product(LOAD_CACHE_MODIFIERS, STORE_CACHE_MODIFIERS))
)
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Cartesian-product parameterization across modifiers creates a large number of JIT-compiled test cases (especially for put and both copy directions), which can significantly increase CI time. If runtime becomes an issue, consider sampling representative modifier pairs (e.g., None, one mid-level like .cg, and one extreme like .cv/.wt) or marking the full matrix as a slower test tier.

Copilot uses AI. Check for mistakes.
Comment on lines +456 to +458
@pytest.mark.parametrize(
"load_cache_modifier,store_cache_modifier", list(product(LOAD_CACHE_MODIFIERS, STORE_CACHE_MODIFIERS))
)
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Cartesian-product parameterization across modifiers creates a large number of JIT-compiled test cases (especially for put and both copy directions), which can significantly increase CI time. If runtime becomes an issue, consider sampling representative modifier pairs (e.g., None, one mid-level like .cg, and one extreme like .cv/.wt) or marking the full matrix as a slower test tier.

Copilot uses AI. Check for mistakes.
Comment on lines +508 to +511
@pytest.mark.parametrize(
"load_cache_modifier,store_cache_modifier",
list(product(LOAD_CACHE_MODIFIERS, STORE_CACHE_MODIFIERS)),
)
Copy link

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Cartesian-product parameterization across modifiers creates a large number of JIT-compiled test cases (especially for put and both copy directions), which can significantly increase CI time. If runtime becomes an issue, consider sampling representative modifier pairs (e.g., None, one mid-level like .cg, and one extreme like .cv/.wt) or marking the full matrix as a slower test tier.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Port cache modifiers, other, and volatile to DeviceContext and Gluon APIs

3 participants