CUDA runtime error (59) : device-side assert triggered
I have access to Tesla K20c, I am running ResNet50 on CIFAR10 dataset... Then I get the error as:
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/THC/generated/../generic/THCTensorMathPointwise.cu line=265 error=59 : device-side assert triggered
Traceback (most recent call last):
File "main.py", line 109, in <module>
train(loader_train, model, criterion, optimizer)
File "main.py", line 54, in train
optimizer.step()
File "/usr/local/anaconda35/lib/python3.6/site-packages/torch/optim/sgd.py", line 93, in step
d_p.add_(weight_decay, p.data)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1524584710464/work/aten/src/THC/generated/../generic/THCTensorMathPointwise.cu:265
How to resolve this error?
Solution 1:
I have encountered this problem several times. And I find it to be an index issue.
For example, if your ground truth label starts at 1: target = [1,2,3,4,5]
, then you should subtract 1
for every label, change it to: [0,1,2,3,4]
.
This solves my problem every time.
Solution 2:
In general, when encountering cuda runtine error
s, it is advisable to run your program again using the CUDA_LAUNCH_BLOCKING=1
flag to obtain an accurate stack trace.
In your specific case, the targets of your data were too high (or low) for the specified number of classes.