Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Gate/grouped_topk scoring func dtype issue(BF16 vs FP32) #696

Open
jikunshang opened this issue Feb 21, 2025 · 1 comment
Open

[BUG] Gate/grouped_topk scoring func dtype issue(BF16 vs FP32) #696

jikunshang opened this issue Feb 21, 2025 · 1 comment

Comments

@jikunshang
Copy link

jikunshang commented Feb 21, 2025

Describe the bug

on huggingface implementation, GateMoE will use FP32 for Linear and further compute. see https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/modeling_deepseek.py#L427-L429

while on github implementation, Gate weight is BF16 and it will use BF16 for linear, scoring_func(sigmod) and further. see link https://github.com/deepseek-ai/DeepSeek-V3/blob/main/inference/model.py#L573-L577

would this cause accuracy issue? and any recommended/reference impl?

To Reproduce
Steps to reproduce the behavior.

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

@jikunshang
Copy link
Author

@GeeeekExplorer @mowentian Please take a look, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant