Skip to content

Issues executing examples. CUDA_ERROR_ILLEGAL_ADDRESS and torch.bmm received an invalid combination of arguments #1

@dhorka

Description

@dhorka

Hi,

I have some issues executing your code. First, I tried to execute your example with modelnet 10 using the command provided. It seemed to work but an advanced epoch the code crash with this error:

Traceback (most recent call last):
  File "main.py", line 315, in <module>
    main()
  File "main.py", line 217, in main
    acc_train, loss, t_loader, t_trainer = train(epoch)
  File "main.py", line 155, in train
    loss_meter.add(loss.data[0])
RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /pytorch/torch/lib/THC/generic/THCStorage.c:32

Traceback (most recent call last):
  File "cupy/cuda/driver.pyx", line 159, in cupy.cuda.driver.moduleUnload
  File "cupy/cuda/driver.pyx", line 75, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy/cuda/driver.pyx", line 159, in cupy.cuda.driver.moduleUnload
  File "cupy/cuda/driver.pyx", line 75, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
  File "cupy/cuda/driver.pyx", line 159, in cupy.cuda.driver.moduleUnload
  File "cupy/cuda/driver.pyx", line 75, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy/cuda/driver.pyx", line 159, in cupy.cuda.driver.moduleUnload
  File "cupy/cuda/driver.pyx", line 75, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
  File "cupy/cuda/driver.pyx", line 159, in cupy.cuda.driver.moduleUnload
  File "cupy/cuda/driver.pyx", line 75, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy/cuda/driver.pyx", line 159, in cupy.cuda.driver.moduleUnload
  File "cupy/cuda/driver.pyx", line 75, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
  File "cupy/cuda/driver.pyx", line 159, in cupy.cuda.driver.moduleUnload
  File "cupy/cuda/driver.pyx", line 75, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy/cuda/driver.pyx", line 159, in cupy.cuda.driver.moduleUnload
  File "cupy/cuda/driver.pyx", line 75, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
=

I executed the code several times and the error appears randomly, it is not always in the same epoch, also it is not appearing in the same part of the code, here you can see an other example of the error:

File "main.py", line 315, in <module>
    main()
  File "main.py", line 217, in main
    acc_train, loss, t_loader, t_trainer = train(epoch)
  File "main.py", line 152, in train
    loss.backward()
  File "/projects/env/ecc/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
  File "/projects/env/ecc/lib/python3.6/site-packages/torch/autograd/__init__.py", line 99, in backward
    variables, grad_variables, retain_graph)
RuntimeError: cublas runtime error : an internal operation failed at /pytorch/torch/lib/THC/THCBlas.cu:247

Traceback (most recent call last):
  File "cupy/cuda/driver.pyx", line 159, in cupy.cuda.driver.moduleUnload
  File "cupy/cuda/driver.pyx", line 75, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy/cuda/driver.pyx", line 159, in cupy.cuda.driver.moduleUnload
  File "cupy/cuda/driver.pyx", line 75, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
  File "cupy/cuda/driver.pyx", line 159, in cupy.cuda.driver.moduleUnload
  File "cupy/cuda/driver.pyx", line 75, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy/cuda/driver.pyx", line 159, in cupy.cuda.driver.moduleUnload
  File "cupy/cuda/driver.pyx", line 75, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
  File "cupy/cuda/driver.pyx", line 159, in cupy.cuda.driver.moduleUnload
  File "cupy/cuda/driver.pyx", line 75, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy/cuda/driver.pyx", line 159, in cupy.cuda.driver.moduleUnload
  File "cupy/cuda/driver.pyx", line 75, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
  File "cupy/cuda/driver.pyx", line 159, in cupy.cuda.driver.moduleUnload
  File "cupy/cuda/driver.pyx", line 75, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.__dealloc__'
Traceback (most recent call last):
  File "cupy/cuda/driver.pyx", line 159, in cupy.cuda.driver.moduleUnload
  File "cupy/cuda/driver.pyx", line 75, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered

I tried different versions of pytorch: 0.2 0.3 0.4. The three versions was installed using pip, and also I tried to execute the code with a compiled from source version (0.2) the same error appears. I am using a machine with: 60gb of ram, Intel Xeon and a titan X with 12gb of ram. Moreover I tried to use different versions of open3d: 0.2.0 and 0.3.0. Finally I modified your sample command and I add edge_mem_limit in order to limit the memory used on the gpu without success.

Also I tested the code using the Sydney Urban Objects example, but in this case, this error is appearing at the begging of the execution:

File "main.py", line 315, in <module>
    main()
  File "main.py", line 217, in main
    acc_train, loss, t_loader, t_trainer = train(epoch)
  File "main.py", line 148, in train
    outputs = model(inputs)
  File "/project/env/ecc/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/project/code/ecc/models.py", line 103, in forward
    input = module(input)
  File "/project/env/ecc/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/project/code/ecc/ecc/GraphConvModule.py", line 171, in forward
    return GraphConvFunction(self._in_channels, self._out_channels, idxn, idxe, degs, degs_gpu, self._edge_mem_limit)(input, weights)
  File "/project/code/ecc/ecc/GraphConvModule.py", line 63, in forward
    self._multiply(sel_input, sel_weights, products, lambda a: a.unsqueeze(1))
  File "/project/code/ecc/ecc/GraphConvModule.py", line 36, in _multiply
    torch.bmm(f_a(a) if f_a else a, f_b(b) if f_b else b, out=out)
TypeError: torch.bmm received an invalid combination of arguments - got (torch.DoubleTensor, torch.FloatTensor, out=torch.DoubleTensor), but expected (torch.DoubleTensor source, torch.DoubleTensor mat2, *, torch.DoubleTensor out)

Please can you give me some hint in order to solve the issues?

Thanks,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions