-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
discussion: adding llama.cpp support #22
Comments
@joydeep049 I think we should have a reference to some fixed commit of llama.cpp as it evolves fast a repository. This reference can be put in a git submodule file or something similar based on your approach. This ensures that nyuntam functionality doesn’t break with auto updates and allows for manual updating. |
I'm getting an error which trying to run one of the existing configs in nyuntam-text-generation. /dev/nyuntam$ python main.py --yaml_path text_generation/scripts/quantisation/awq.yaml --json_path ./
01/10/2025 17-03-05 - INFO - numexpr.utils - NumExpr defaulting to 4 threads.
01/10/2025 17-03-06 - INFO - datasets - PyTorch version 2.3.0 available.
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Killed I got a similar error when I was testing Nyuntam-vision. |
I think you should try the examples folder for testing awq. I will take a look at this yaml file as well. We are deprecating vision for now, along with planner overhauls to this repository. |
I'm getting the same error while trying to run the script in the example folder. It's failing before the dataset can be completely downloaded. However, I'm not getting the same error while downloading the same model using a separate independent script. python main.py --yaml_path examples/text-generation/awq_quantization/config.yaml --json_path ./
01/11/2025 21-01-23 - INFO - numexpr.utils - NumExpr defaulting to 4 threads.
01/11/2025 21-01-23 - INFO - datasets - PyTorch version 2.3.0 available.
model-00002-of-00002.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.50G/3.50G [15:08<00:00, 3.75MB/s]
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [15:10<00:00, 455.18s/it]
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Killed |
Hello
From what I've understood, llama.cpp always requires hf model to be converted to gguf. We want to add llama.cpp support directly and anytime we use llama.cpp as an algorithm, the model should first be converted into gguf and stored, and then we either
quantize
,prune
ordistill
the model after that.aqlm
, where we have a direct reference to a the forked repo?@Arnav0400 @ishanpandeynyunai
The text was updated successfully, but these errors were encountered: