Summary
numba.cuda.cudadrv.devicearray._assign_kernel uses @lru_cache but is not
context-aware. After calling cuda.current_context().reset(), the cached
kernel holds a stale reference to an unloaded CUDA module, causing subsequent
device array assignments to fail with CUDA_ERROR_INVALID_HANDLE.
Minimal Reproduction
"""
Minimal reproduction of _assign_kernel cache invalidation bug.
Run with: python test_assign_kernel_bug.py
Expected: CUDA_ERROR_INVALID_HANDLE on second assignment
"""
from numba import cuda
import numpy as np
# Step 1: Trigger _assign_kernel compilation and caching
data = cuda.device_array(10, dtype=np.int32)
data[0] = 1 # This compiles and caches _assign_kernel(ndim=1)
cuda.synchronize()
print("First assignment: OK")
# Step 2: Reset context (invalidates all CUDA modules)
ctx = cuda.current_context()
ctx.reset()
print("Context reset")
# Step 3: Try another assignment - FAILS
data2 = cuda.device_array(10, dtype=np.int32)
try:
data2[0] = 1 # Uses stale cached _assign_kernel
print("Second assignment: OK")
except Exception as e:
print(f"Second assignment FAILED: {type(e).__name__}: {e}")
Output
First assignment: OK
Context reset
Second assignment FAILED: CudaAPIError: [400] Call to cuOccupancyMaxPotentialBlockSize results in CUDA_ERROR_INVALID_HANDLE
Analysis
Root Cause
In numba/cuda/cudadrv/devicearray.py:
@lru_cache
def _assign_kernel(ndim):
@cuda.jit
def kernel(lhs, rhs):
# ... implementation
return kernel
The cache key is only ndim, with no awareness of CUDA context state. When
ctx.reset() is called:
ctx.modules.clear() unloads all compiled CUDA modules
- The
_assign_kernel LRU cache still holds the old kernel dispatcher
- The dispatcher's internal
_func.module reference points to an unloaded module
- Next
arr[idx] = val call uses stale kernel → CUDA_ERROR_INVALID_HANDLE
Why This Matters
This affects any code that:
- Uses device array assignment (
arr[idx] = val)
- Calls
ctx.reset() (common in test fixtures for isolation)
- Uses device array assignment again
This is particularly problematic in pytest where fixtures commonly reset
context between tests.
Workaround
from numba.cuda.cudadrv.devicearray import _assign_kernel
ctx.reset()
_assign_kernel.cache_clear() # Must be called after every reset
Suggested Fix
Option A: Make cache context-aware
def _assign_kernel(ndim):
ctx = cuda.current_context()
cache_key = (ndim, id(ctx.modules)) # Invalidates on reset
# ...
Option B: Clear cache in Context.reset()
def reset(self):
self.memory_manager.reset()
self.modules.clear()
self.deallocations.clear()
# Clear caches that may hold stale module references
from numba.cuda.cudadrv.devicearray import _assign_kernel
_assign_kernel.cache_clear()
Option C: Use weak references in dispatcher
The cached kernel's module reference could use weakrefs that become invalid
when the module is unloaded, triggering recompilation.
Environment
- numba-cuda: 0.22.1
- numba: 0.62.1
- CUDA: 12.x
- Python: 3.13
Related
Other caches that may have similar issues:
numba.cuda.dispatcher.configure (also uses @lru_cache)
- Any other context-agnostic kernel caches
Summary
numba.cuda.cudadrv.devicearray._assign_kerneluses@lru_cachebut is notcontext-aware. After calling
cuda.current_context().reset(), the cachedkernel holds a stale reference to an unloaded CUDA module, causing subsequent
device array assignments to fail with
CUDA_ERROR_INVALID_HANDLE.Minimal Reproduction
Output
Analysis
Root Cause
In
numba/cuda/cudadrv/devicearray.py:The cache key is only
ndim, with no awareness of CUDA context state. Whenctx.reset()is called:ctx.modules.clear()unloads all compiled CUDA modules_assign_kernelLRU cache still holds the old kernel dispatcher_func.modulereference points to an unloaded modulearr[idx] = valcall uses stale kernel →CUDA_ERROR_INVALID_HANDLEWhy This Matters
This affects any code that:
arr[idx] = val)ctx.reset()(common in test fixtures for isolation)This is particularly problematic in pytest where fixtures commonly reset
context between tests.
Workaround
Suggested Fix
Option A: Make cache context-aware
Option B: Clear cache in Context.reset()
Option C: Use weak references in dispatcher
The cached kernel's module reference could use weakrefs that become invalid
when the module is unloaded, triggering recompilation.
Environment
Related
Other caches that may have similar issues:
numba.cuda.dispatcher.configure(also uses@lru_cache)