-
Notifications
You must be signed in to change notification settings - Fork 548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Outlines Transformers requires a ton more VRAM #1392
Comments
Also want to add that with I also needed to create the |
I have localized the issue to the sampler. When I use the greedy sampler ( |
Thanks for reporting the issue! FYI |
You are correct. From
If I am simply reading the documentation of one or both libraries wrong that would be nice to know, but I find it surprising that the |
Describe the issue as clearly as possible:
I am running outlines on an ~8B parameter model on an A10 GPU with 24GB of VRAM. When running the model myself using
transformers
the model uses just under 15GB of this memory. When usingoutlines
the model requests over 27GB, causing execution to fail.Steps/code to reproduce the bug:
Expected result:
Error message:
Outlines/Python version information:
python: '3.10'
transformers: '4.37.2'
outlines: '0.1.13'
torch: '2.3.1'
Context for the issue:
Being able to run the model is obviously a priority for me, but it is also non-intuitive why this would increase the VRAM requirements. There are ways I can solve this with quantization and the like, but I would rather find a more permanent solution than a work-around on my end.
The text was updated successfully, but these errors were encountered: