discussion: adding llama.cpp support #22

joydeep049 · 2025-01-09T13:09:57Z

Hello

From what I've understood, llama.cpp always requires hf model to be converted to gguf. We want to add llama.cpp support directly and anytime we use llama.cpp as an algorithm, the model should first be converted into gguf and stored, and then we either quantize, prune or distill the model after that.

I am thinking of adding a direct link to the llama.cpp repo. Is this okay, or would we require to do something like we did for aqlm, where we have a direct reference to a the forked repo?
If I made any incorrect assumptions above, please let me know.

@Arnav0400 @ishanpandeynyunai

The text was updated successfully, but these errors were encountered:

ishanpandeynyunai · 2025-01-09T13:16:11Z

@joydeep049 I think we should have a reference to some fixed commit of llama.cpp as it evolves fast a repository. This reference can be put in a git submodule file or something similar based on your approach.

This ensures that nyuntam functionality doesn’t break with auto updates and allows for manual updating.

joydeep049 · 2025-01-10T11:41:35Z

I'm getting an error which trying to run one of the existing configs in nyuntam-text-generation.

/dev/nyuntam$ python main.py --yaml_path text_generation/scripts/quantisation/awq.yaml --json_path ./
01/10/2025 17-03-05 - INFO - numexpr.utils - NumExpr defaulting to 4 threads.
01/10/2025 17-03-06 - INFO - datasets - PyTorch version 2.3.0 available.
Loading checkpoint shards:   0%|                                                                  | 0/2 [00:00<?, ?it/s]
Killed

I got a similar error when I was testing Nyuntam-vision.
How should I solve this?

ishanpandeynyunai · 2025-01-10T11:44:23Z

I think you should try the examples folder for testing awq. I will take a look at this yaml file as well. We are deprecating vision for now, along with planner overhauls to this repository.

joydeep049 · 2025-01-11T15:55:15Z

I'm getting the same error while trying to run the script in the example folder. It's failing before the dataset can be completely downloaded. However, I'm not getting the same error while downloading the same model using a separate independent script.

python main.py --yaml_path examples/text-generation/awq_quantization/config.yaml --json_path ./
01/11/2025 21-01-23 - INFO - numexpr.utils - NumExpr defaulting to 4 threads.
01/11/2025 21-01-23 - INFO - datasets - PyTorch version 2.3.0 available.
model-00002-of-00002.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.50G/3.50G [15:08<00:00, 3.75MB/s]
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [15:10<00:00, 455.18s/it]
Loading checkpoint shards:   0%|                                                                                                                                                             | 0/2 [00:00<?, ?it/s]
Killed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

discussion: adding llama.cpp support #22

discussion: adding llama.cpp support #22

joydeep049 commented Jan 9, 2025

ishanpandeynyunai commented Jan 9, 2025 •

edited

Loading

joydeep049 commented Jan 10, 2025

ishanpandeynyunai commented Jan 10, 2025

joydeep049 commented Jan 11, 2025

discussion: adding llama.cpp support #22

discussion: adding llama.cpp support #22

Comments

joydeep049 commented Jan 9, 2025

ishanpandeynyunai commented Jan 9, 2025 • edited Loading

joydeep049 commented Jan 10, 2025

ishanpandeynyunai commented Jan 10, 2025

joydeep049 commented Jan 11, 2025

ishanpandeynyunai commented Jan 9, 2025 •

edited

Loading