PyKeOps LazyTensor + AMP (float16) crashes when using .ranges
Description
I am using AMP / mixed-precision (float16) training together with PyKeOps LazyTensors. Specifically, I perform the following operations:
orientation_vector_ij.ranges = self.ranges # Block-diagonal sparsity mask
orientation_vector_i = orientation_vector_ij.sum(dim=1)
orientation_vector_ij is a LazyTensor based on float16.
self.ranges is correctly formatted (following KeOps batch conventions).
When running this with AMP / float16, I get the following error:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/lizhenghao/anaconda3/envs/dmasif_3/lib/python3.8/site-packages/pykeops/common/lazy_tensor.py", line 2096, in sum
return self.reduction("Sum", axis=axis, **kwargs)
File "/home/lizhenghao/anaconda3/envs/dmasif_3/lib/python3.8/site-packages/pykeops/common/lazy_tensor.py", line 775, in reduction
return res()
File "/home/lizhenghao/anaconda3/envs/dmasif_3/lib/python3.8/site-packages/pykeops/common/lazy_tensor.py", line 957, in __call__
return self.callfun(*args, *self.variables, **self.kwargs)
File "/home/lizhenghao/anaconda3/envs/dmasif_3/lib/python3.8/site-packages/pykeops/torch/generic/generic_red.py", line 693, in __call__
out = GenredAutograd_fun(params, *args)
File "/home/lizhenghao/anaconda3/envs/dmasif_3/lib/python3.8/site-packages/pykeops/torch/generic/generic_red.py", line 383, in GenredAutograd_fun
return GenredAutograd.apply(*inputs)[0]
File "/home/lizhenghao/anaconda3/envs/dmasif_3/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs)
File "/home/lizhenghao/anaconda3/envs/dmasif_3/lib/python3.8/site-packages/pykeops/torch/generic/generic_red.py", line 291, in forward
return GenredAutograd_base._forward(*inputs)
File "/home/lizhenghao/anaconda3/envs/dmasif_3/lib/python3.8/site-packages/pykeops/torch/generic/generic_red.py", line 121, in _forward
result = myconv.genred_pytorch(
File "/home/lizhenghao/anaconda3/envs/dmasif_3/lib/python3.8/site-packages/pykeops/common/keops_io/LoadKeOps.py", line 190, in genred
args, ranges, tag_dummy, N = preprocess_half2(
File "/home/lizhenghao/anaconda3/envs/dmasif_3/lib/python3.8/site-packages/pykeops/torch/half2_convert.py", line 101, in preprocess_half2
ranges = ranges2half2(ranges[0:3], ny) + ranges[3:6]
File "/home/lizhenghao/anaconda3/envs/dmasif_3/lib/python3.8/site-packages/pykeops/torch/half2_convert.py", line 69, in ranges2half2
redranges_j = torch.cat((redranges_j, redranges_j_block2), dim=0)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:2 and cpu! (when checking argument for argument tensors in method wrapper_CUDA_cat)
Observations
Using float32, the same code works perfectly.
Using AMP / float16 without setting .ranges, there is no error.
The error occurs only when combining .ranges + LazyTensor + float16.
Suspected Cause
It seems that PyKeOps is not fully compatible with half-precision LazyTensors when using .ranges. Some internal tensors may still be created on CPU, causing a device mismatch during torch.cat.
Environment
PyTorch: 2.0.0
KeOps: 2.3
Python: 3.8
CUDA: 11.8
Request
Guidance on how to correctly use LazyTensor with .ranges under AMP / float16.
If this is a bug, please advise if there is a workaround or if a fix is planned.
Thank you!
PyKeOps LazyTensor + AMP (float16) crashes when using
.rangesDescription
I am using AMP / mixed-precision (float16) training together with PyKeOps LazyTensors. Specifically, I perform the following operations:
self.ranges is correctly formatted (following KeOps batch conventions).
When running this with AMP / float16, I get the following error:
Observations
Using float32, the same code works perfectly.
Using AMP / float16 without setting .ranges, there is no error.
The error occurs only when combining .ranges + LazyTensor + float16.
Suspected Cause
It seems that PyKeOps is not fully compatible with half-precision LazyTensors when using .ranges. Some internal tensors may still be created on CPU, causing a device mismatch during torch.cat.
Environment
PyTorch: 2.0.0
KeOps: 2.3
Python: 3.8
CUDA: 11.8
Request
Guidance on how to correctly use LazyTensor with .ranges under AMP / float16.
If this is a bug, please advise if there is a workaround or if a fix is planned.
Thank you!