Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aborted (core dumped) - Reg #26

Open
Adhiyaman-Manickam opened this issue Jan 6, 2020 · 6 comments
Open

Aborted (core dumped) - Reg #26

Adhiyaman-Manickam opened this issue Jan 6, 2020 · 6 comments

Comments

@Adhiyaman-Manickam
Copy link

Hi @daa233

Thanks for you great work.

I have got the following error in torch.stack after completing the 3k iterations of training model. I unable to found the error.
dumb

Kindly help me out the error.
Thanks in advance.

@daa233
Copy link
Owner

daa233 commented Jan 6, 2020

In my view, core dumped error may happen when your GPU memory is insufficient during training. Maybe you can monitor the GPU memory usage when you train it again.

Since there are no more log information, I have no more idea about it.

@Adhiyaman-Manickam
Copy link
Author

Thanks for your reply.
I have used linux server.
I have checked. I have a enough GPU memory.
Is this problem due to stack (torch.stack and python code) or linux server (i,e gpu)
It is working on single gpu but not muliti gpu.

dump1

@daa233
Copy link
Owner

daa233 commented Jan 6, 2020

I didn't meet this problem when training on multi-GPUs so far. The code has been tested on PyTorch 1.0.1 and PyTorch 1.2.

You'd better post more information about the error trace (although I know sometimes core dump errors give little information). It will save a lot of time if you can locate which line caused the error. Maybe you can try to debug or print some log by inserting some code.

Since it is possibly caused by torch.stack, maybe you can check this visualization code under your settings.

@Adhiyaman-Manickam
Copy link
Author

Thank you so much for you prompt reply. I will try my level best to fix the issues as per your directions.
Thanks.

@Adhiyaman-Manickam
Copy link
Author

Adhiyaman-Manickam commented Jan 17, 2020

Hi @daa233

continue to the above issue,

I am getting the following warning, if i am starting to use multi GPU then core dumped error comes.

UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector. warnings.warn('Was asked to gather along dimension 0, but all

kindly, let me know, where i made a mistake.
Many thanks

@daa233
Copy link
Owner

daa233 commented Jan 18, 2020

@Adhiyaman-Manickam Just as the above words, you'd better debug by yourself and find the exact line of code caused the error.

Without debugging or any information with the lines of code, it is difficult to judge just from the error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants