-
Notifications
You must be signed in to change notification settings - Fork 100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add HF model support inc. DS-R1-Distill, Qwen needs yarn support #17421
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bravo sir
models/demos/llama3/tests/reference_outputs/Qwen2.5-7B-Instruct.refpt
Outdated
Show resolved
Hide resolved
@yieldthought Re-generated all Llama3 cache files in CI for N150 / N300 / T3K. TG will need to be regenerated at a later date through CI. Re-running all pipelines. The T3K old one was not even building correctly. |
Ready to merge when tests pass |
All passing locally. Running the latest CI pipelines here. If they pass we're good to merge. |
This reverts commit bd491a2.
Updated the issues on the description. Investigating the remaining ones that consistently fail. |
Problem description
Existing codebase loads the meta checkpoint format but many derivative models are only available on huggingface.
What's changed
Add support for loading HuggingFace model formats, paving the way for full Qwen support (pending yarn rope implementation) and adding DeepSeek-R1-Distill-Llama-70B support.
Checklist
All passing locally.