NaN in training #72
-
Hi, thanks for your good work and your code! When I trained with one gpu on the full HICODET, or trained with 4 gpus on one subset of HICODET(5000 imgs), I always met NaN. Did you met this before? Can I fix this using some common engineering techniques? |
Beta Was this translation helpful? Give feedback.
Answered by
fredzzhang
Dec 9, 2022
Replies: 2 comments
-
Hi @weiyana, Please refer to #71. The issue is the batch size. The number of GPUs you use times the per-GPU batch size should be at least 16. Fred. |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
weiyana
-
This works for me, thx! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi @weiyana,
Please refer to #71. The issue is the batch size. The number of GPUs you use times the per-GPU batch size should be at least 16.
Fred.