Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA error: device-side assert triggered #6

Open
Inosonnia opened this issue Nov 27, 2018 · 2 comments
Open

CUDA error: device-side assert triggered #6

Inosonnia opened this issue Nov 27, 2018 · 2 comments

Comments

@Inosonnia
Copy link

After generating the mask by encode_segmap, SpatialClassNLLCriterion.cu reports: Assertion t >= 0 && t < n_classes failed.
It seems that the labels exceed the range of n_classes.
I used the default data mentioned in the src code.
Thank you.

@shahsohil
Copy link
Owner

@Inosonnia Can you please provide me with more details such as the value of labels which seemed to have exceeded n_classes ?

@Inosonnia
Copy link
Author

Inosonnia commented Dec 10, 2018

Since I used the default seg data mentioned in the src code (i.e., VOC12+benchmark_RELEASE), the clses are indeed fall in [0, 20], as I tried to print them out.
It seems that the cls info of data is ok (the mask generation period is fine), and this error occurs even I change n_classes to a big value (i.e., num_classes = 1000, in models/pretrained/sunet.py).

The detailed info is listed here:

pytorch/aten/src/THC/generated/../THCTensorMathCompare.cuh line=82 error=59 : device-side assert triggered
pytorch/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [1,0,0], thread: [768,0,0] Assertion t >= 0 && t < n_classes failed.
Traceback (most recent call last):
File "train_seg.py", line 340, in
train(args)
File "train_seg.py", line 146, in train
trainmodel(model, optimizer, trainloader, epoch, scheduler, traindata)
File "train_seg.py", line 226, in trainmodel
model(imagesV, labelsV)
File "/python/lib/python2.7/site-packages/torch/nn/modules/module.py", line 479, in call
result = self.forward(*input, **kwargs)
File "/python/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 143, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/python/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 153, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/python/lib/python2.7/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply
raise output
RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/THC/generated/../THCTensorMathCompare.cuh:82

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants