Replies: 1 comment 5 replies
-
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
https://github.com/AlpinDale/sparsegpt-for-LLaMA
https://arxiv.org/abs/2301.00774
"We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models."
Looks like someone is has implemented SparseGPT for the Llama model. If I understand correctly that means we can cut in half the size of the llama models without significant loss of precision.
I want to know what you think about it and if you're planning on testing it and see if you can get the same result with half the VRAM, I would finally be able to run the 13b on my shitty 8gb 2080 super 🤣
PS: In less than a month, 65B Llama will work on the super nintendo 👀
Beta Was this translation helpful? Give feedback.
All reactions