NaN in training #72

Answered by fredzzhang

weiyana asked this question in Q&A

weiyana
Dec 9, 2022

Hi, thanks for your good work and your code! When I trained with one gpu on the full HICODET, or trained with 4 gpus on one subset of HICODET(5000 imgs), I always met NaN. Did you met this before? Can I fix this using some common engineering techniques?
Thank you!

Answered by fredzzhang

Please refer to #71. The issue is the batch size. The number of GPUs you use times the per-GPU batch size should be at least 16.

Fred.

View full answer

Replies: 2 comments

fredzzhang
Dec 9, 2022
Maintainer

Please refer to #71. The issue is the batch size. The number of GPUs you use times the per-GPU batch size should be at least 16.

Fred.

0 replies

Answer selected by weiyana

weiyana
Dec 9, 2022
Author

This works for me, thx!

0 replies

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment