Need some help in modifying model train parameters #15

ZhouMiaoGX · 2024-11-22T09:53:56Z

Hello, I encountered some issues when reproducing your model. When evaluating the public model weights (ADE20K, UperNet-R50) using Mask2Former, the mIoU was 43.46, which matches the revised metrics in your paper.However, the highest metric I achieved during reproduction was 41.18 (on RTX 4090 D-24G2 with train_batch_size=1 and gradient_accumulation_steps=4). I have tried using different devices such as A100-80G2, H20-96G4, and A100-40G1, but none of them achieved higher metrics.
Given the limited resources at our school, Can you provide me with some guidance on modifying training parameters to achieve the desired results when resources are limited？

liming-ai · 2024-11-25T01:53:52Z

Hello, I encountered some issues when reproducing your model. When evaluating the public model weights (ADE20K, UperNet-R50) using Mask2Former, the mIoU was 43.46, which matches the revised metrics in your paper.However, the highest metric I achieved during reproduction was 41.18 (on RTX 4090 D-24G2 with train_batch_size=1 and gradient_accumulation_steps=4). I have tried using different devices such as A100-80G2, H20-96G4, and A100-40G1, but none of them achieved higher metrics. Given the limited resources at our school, Can you provide me with some guidance on modifying training parameters to achieve the desired results when resources are limited？

Hi @ZhouMiaoGX ,

Thanks for the question. I strongly recommend using the hyperparameters recommended in our scripts to reproduce the experimental results, such as batch size, since we do not have other experiment experiences.

Considering your computing resource issues, I suggest you increase gradient_accumulation_steps to ensure that the total batch size of training is consistent with ours. However, since the training hyperparameters have been changed and diffusion training usually requires a larger batch size, I cannot guarantee that this will completely reproduce the results in the script.

ZhouMiaoGX · 2024-11-27T06:24:14Z

Hello, I encountered some issues when reproducing your model. When evaluating the public model weights (ADE20K, UperNet-R50) using Mask2Former, the mIoU was 43.46, which matches the revised metrics in your paper.However, the highest metric I achieved during reproduction was 41.18 (on RTX 4090 D-24G2 with train_batch_size=1 and gradient_accumulation_steps=4). I have tried using different devices such as A100-80G2, H20-96G4, and A100-40G1, but none of them achieved higher metrics. Given the limited resources at our school, Can you provide me with some guidance on modifying training parameters to achieve the desired results when resources are limited？

Hi @ZhouMiaoGX ,

Thanks for the question. I strongly recommend using the hyperparameters recommended in our scripts to reproduce the experimental results, such as batch size, since we do not have other experiment experiences.

Considering your computing resource issues, I suggest you increase to ensure that the total batch size of training is consistent with ours. However, since the training hyperparameters have been changed and diffusion training usually requires a larger batch size, I cannot guarantee that this will completely reproduce the results in the script.gradient_accumulation_steps

Thank you very much for your reply and help. I will try to modify the parameters again to train, hoping to reproduce successfully.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need some help in modifying model train parameters #15

Need some help in modifying model train parameters #15

ZhouMiaoGX commented Nov 22, 2024

liming-ai commented Nov 25, 2024

ZhouMiaoGX commented Nov 27, 2024

Need some help in modifying model train parameters #15

Need some help in modifying model train parameters #15

Comments

ZhouMiaoGX commented Nov 22, 2024

liming-ai commented Nov 25, 2024

ZhouMiaoGX commented Nov 27, 2024