You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
There are 16B, 236B and 671B configs. But the convert.py seems only remove the MTP module and chunk the experts. And the 236B config likes the DeepSeek-V2.5's config.
Expected behavior
Please give some explanations on this.
The text was updated successfully, but these errors were encountered:
Describe the bug
There are 16B, 236B and 671B configs. But the convert.py seems only remove the MTP module and chunk the experts. And the 236B config likes the DeepSeek-V2.5's config.
Expected behavior
Please give some explanations on this.
The text was updated successfully, but these errors were encountered: