How does exllama_hf use less VRAM? #3248
bartowski1182
started this conversation in
General
Replies: 1 comment 1 reply
-
There was a comment in the exllama pull request which went into detail, but essentially from what I understood, the top memory gains were made by not fragmenting memory by not dynamically growing the memory (it allocates max sizes at start). There may be other improvements, but that was the big one afaik. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
As far as I can find, all it does is replace the logits and the samples/pipeline (though even that part confuses me), so why does it result in lower consumption? Anyone know where I can read up on it?
Beta Was this translation helpful? Give feedback.
All reactions