llama-server can not host Deepseek-r1-distill-Qwen-1.5B on CUDA #11673
-
I carefully follwing this documents to built project After build, I run llama-server with my command I track with And the VRAM of GPU is like this: I don't know if there is misunderstood but when I ran And after that, I ran the command
So my question is, is there any way I can host and inference my model on CUDA |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
You need to add the |
Beta Was this translation helpful? Give feedback.
You need to add the
-ngl
parameter to the command line. Try-ngl 99
.