Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

discussion: adding llama.cpp support #22

Open
joydeep049 opened this issue Jan 9, 2025 · 4 comments
Open

discussion: adding llama.cpp support #22

joydeep049 opened this issue Jan 9, 2025 · 4 comments

Comments

@joydeep049
Copy link

Hello

From what I've understood, llama.cpp always requires hf model to be converted to gguf. We want to add llama.cpp support directly and anytime we use llama.cpp as an algorithm, the model should first be converted into gguf and stored, and then we either quantize, prune or distill the model after that.

  1. I am thinking of adding a direct link to the llama.cpp repo. Is this okay, or would we require to do something like we did for aqlm, where we have a direct reference to a the forked repo?
  2. If I made any incorrect assumptions above, please let me know.

@Arnav0400 @ishanpandeynyunai

@ishanpandeynyunai
Copy link
Collaborator

ishanpandeynyunai commented Jan 9, 2025

@joydeep049 I think we should have a reference to some fixed commit of llama.cpp as it evolves fast a repository. This reference can be put in a git submodule file or something similar based on your approach.

This ensures that nyuntam functionality doesn’t break with auto updates and allows for manual updating.

@joydeep049
Copy link
Author

I'm getting an error which trying to run one of the existing configs in nyuntam-text-generation.

/dev/nyuntam$ python main.py --yaml_path text_generation/scripts/quantisation/awq.yaml --json_path ./
01/10/2025 17-03-05 - INFO - numexpr.utils - NumExpr defaulting to 4 threads.
01/10/2025 17-03-06 - INFO - datasets - PyTorch version 2.3.0 available.
Loading checkpoint shards:   0%|                                                                  | 0/2 [00:00<?, ?it/s]
Killed                                        

I got a similar error when I was testing Nyuntam-vision.
How should I solve this?

@ishanpandeynyunai
Copy link
Collaborator

I think you should try the examples folder for testing awq. I will take a look at this yaml file as well. We are deprecating vision for now, along with planner overhauls to this repository.

@joydeep049
Copy link
Author

I'm getting the same error while trying to run the script in the example folder. It's failing before the dataset can be completely downloaded. However, I'm not getting the same error while downloading the same model using a separate independent script.

python main.py --yaml_path examples/text-generation/awq_quantization/config.yaml --json_path ./
01/11/2025 21-01-23 - INFO - numexpr.utils - NumExpr defaulting to 4 threads.
01/11/2025 21-01-23 - INFO - datasets - PyTorch version 2.3.0 available.
model-00002-of-00002.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3.50G/3.50G [15:08<00:00, 3.75MB/s]
Downloading shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [15:10<00:00, 455.18s/it]
Loading checkpoint shards:   0%|                                                                                                                                                             | 0/2 [00:00<?, ?it/s]
Killed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants