running as a service #131

lfoppiano · 2024-12-05T12:54:05Z

lfoppiano
Dec 5, 2024

Hi, first of all, than you for this tool, it's a very useful and interesting approach for running models on low resources.

I was wondering whether you have any plans to add a way to run it as a service, where the whole model is not loaded every time a new prompt is provided. Something like llama-server?

I did try to run a model quantized for BitNET with llama-server but it seems they are not compatible. Do you have any comment or suggestions?

Thank you in advance
Luca

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

running as a service #131

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

running as a service #131

lfoppiano Dec 5, 2024

Replies: 0 comments

lfoppiano
Dec 5, 2024