[BUG] _assign_kernel cache not invalidated after ctx.reset()

## Summary

`numba.cuda.cudadrv.devicearray._assign_kernel` uses `@lru_cache` but is not
context-aware. After calling `cuda.current_context().reset()`, the cached
kernel holds a stale reference to an unloaded CUDA module, causing subsequent
device array assignments to fail with `CUDA_ERROR_INVALID_HANDLE`.

## Minimal Reproduction

```python
"""
Minimal reproduction of _assign_kernel cache invalidation bug.

Run with: python test_assign_kernel_bug.py
Expected: CUDA_ERROR_INVALID_HANDLE on second assignment
"""
from numba import cuda
import numpy as np

# Step 1: Trigger _assign_kernel compilation and caching
data = cuda.device_array(10, dtype=np.int32)
data[0] = 1  # This compiles and caches _assign_kernel(ndim=1)
cuda.synchronize()
print("First assignment: OK")

# Step 2: Reset context (invalidates all CUDA modules)
ctx = cuda.current_context()
ctx.reset()
print("Context reset")

# Step 3: Try another assignment - FAILS
data2 = cuda.device_array(10, dtype=np.int32)
try:
    data2[0] = 1  # Uses stale cached _assign_kernel
    print("Second assignment: OK")
except Exception as e:
    print(f"Second assignment FAILED: {type(e).__name__}: {e}")
```

## Output

```
First assignment: OK
Context reset
Second assignment FAILED: CudaAPIError: [400] Call to cuOccupancyMaxPotentialBlockSize results in CUDA_ERROR_INVALID_HANDLE
```

## Analysis

### Root Cause

In `numba/cuda/cudadrv/devicearray.py`:

```python
@lru_cache
def _assign_kernel(ndim):
    @cuda.jit
    def kernel(lhs, rhs):
        # ... implementation
    return kernel
```

The cache key is only `ndim`, with no awareness of CUDA context state. When
`ctx.reset()` is called:

1. `ctx.modules.clear()` unloads all compiled CUDA modules
2. The `_assign_kernel` LRU cache still holds the old kernel dispatcher
3. The dispatcher's internal `_func.module` reference points to an unloaded module
4. Next `arr[idx] = val` call uses stale kernel → `CUDA_ERROR_INVALID_HANDLE`

### Why This Matters

This affects any code that:
1. Uses device array assignment (`arr[idx] = val`)
2. Calls `ctx.reset()` (common in test fixtures for isolation)
3. Uses device array assignment again

This is particularly problematic in pytest where fixtures commonly reset
context between tests.

## Workaround

```python
from numba.cuda.cudadrv.devicearray import _assign_kernel

ctx.reset()
_assign_kernel.cache_clear()  # Must be called after every reset
```

## Suggested Fix

Option A: Make cache context-aware

```python
def _assign_kernel(ndim):
    ctx = cuda.current_context()
    cache_key = (ndim, id(ctx.modules))  # Invalidates on reset
    # ...
```

Option B: Clear cache in Context.reset()

```python
def reset(self):
    self.memory_manager.reset()
    self.modules.clear()
    self.deallocations.clear()
    # Clear caches that may hold stale module references
    from numba.cuda.cudadrv.devicearray import _assign_kernel
    _assign_kernel.cache_clear()
```

Option C: Use weak references in dispatcher

The cached kernel's module reference could use weakrefs that become invalid
when the module is unloaded, triggering recompilation.

## Environment

- numba-cuda: 0.22.1
- numba: 0.62.1
- CUDA: 12.x
- Python: 3.13

## Related

Other caches that may have similar issues:
- `numba.cuda.dispatcher.configure` (also uses `@lru_cache`)
- Any other context-agnostic kernel caches


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] _assign_kernel cache not invalidated after ctx.reset() #644

Summary

Minimal Reproduction

Output

Analysis

Root Cause

Why This Matters

Workaround

Suggested Fix

Environment

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG] _assign_kernel cache not invalidated after ctx.reset() #644

Description

Summary

Minimal Reproduction

Output

Analysis

Root Cause

Why This Matters

Workaround

Suggested Fix

Environment

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions