Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need some help in modifying model train parameters #15

Open
ZhouMiaoGX opened this issue Nov 22, 2024 · 2 comments
Open

Need some help in modifying model train parameters #15

ZhouMiaoGX opened this issue Nov 22, 2024 · 2 comments

Comments

@ZhouMiaoGX
Copy link

Hello, I encountered some issues when reproducing your model. When evaluating the public model weights (ADE20K, UperNet-R50) using Mask2Former, the mIoU was 43.46, which matches the revised metrics in your paper.However, the highest metric I achieved during reproduction was 41.18 (on RTX 4090 D-24G2 with train_batch_size=1 and gradient_accumulation_steps=4). I have tried using different devices such as A100-80G2, H20-96G4, and A100-40G1, but none of them achieved higher metrics.
Given the limited resources at our school, Can you provide me with some guidance on modifying training parameters to achieve the desired results when resources are limited?

@liming-ai
Copy link
Owner

Hello, I encountered some issues when reproducing your model. When evaluating the public model weights (ADE20K, UperNet-R50) using Mask2Former, the mIoU was 43.46, which matches the revised metrics in your paper.However, the highest metric I achieved during reproduction was 41.18 (on RTX 4090 D-24G2 with train_batch_size=1 and gradient_accumulation_steps=4). I have tried using different devices such as A100-80G2, H20-96G4, and A100-40G1, but none of them achieved higher metrics. Given the limited resources at our school, Can you provide me with some guidance on modifying training parameters to achieve the desired results when resources are limited?

Hi @ZhouMiaoGX ,

Thanks for the question. I strongly recommend using the hyperparameters recommended in our scripts to reproduce the experimental results, such as batch size, since we do not have other experiment experiences.

Considering your computing resource issues, I suggest you increase gradient_accumulation_steps to ensure that the total batch size of training is consistent with ours. However, since the training hyperparameters have been changed and diffusion training usually requires a larger batch size, I cannot guarantee that this will completely reproduce the results in the script.

@ZhouMiaoGX
Copy link
Author

Hello, I encountered some issues when reproducing your model. When evaluating the public model weights (ADE20K, UperNet-R50) using Mask2Former, the mIoU was 43.46, which matches the revised metrics in your paper.However, the highest metric I achieved during reproduction was 41.18 (on RTX 4090 D-24G2 with train_batch_size=1 and gradient_accumulation_steps=4). I have tried using different devices such as A100-80G2, H20-96G4, and A100-40G1, but none of them achieved higher metrics. Given the limited resources at our school, Can you provide me with some guidance on modifying training parameters to achieve the desired results when resources are limited?

Hi @ZhouMiaoGX ,

Thanks for the question. I strongly recommend using the hyperparameters recommended in our scripts to reproduce the experimental results, such as batch size, since we do not have other experiment experiences.

Considering your computing resource issues, I suggest you increase to ensure that the total batch size of training is consistent with ours. However, since the training hyperparameters have been changed and diffusion training usually requires a larger batch size, I cannot guarantee that this will completely reproduce the results in the script.gradient_accumulation_steps

Thank you very much for your reply and help. I will try to modify the parameters again to train, hoping to reproduce successfully.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants