-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docker image fails to start on mac m3 #257
Comments
It requires an Nvidia GPU for the self-hosted version to work |
Code completion doesn't work very well on apple silicon or CPUs. The context is just too big for completion to be fast enough. Chat without much context is of course possible on CPUs, but that's not a complete product. |
I have 64gb of shared memory, that is, memory that is also usable by the GPU, not just the CPU. Apart from this, I guess the problem is that Apple Silicon support is just not there in the underlying libraries/apps? |
I have M1 in my MacBook Air, I've tested the smallest reasonable models, for example 1b starcoder running llama.cpp:
The problem is not the 49 generated tokens, it's the 557 prefill tokens (in this example) that take 900ms. For a typical 2k or 4k context that will be 4-8 seconds. Normally using cloud Refact, you'll get 250-400ms typical single line completion depending on where in the world you are. So there's 10x gap. I looked at M2 specs, it's not that much faster. Maybe M3 is? @domdorn if you have interesting suggestions that would be awesome! |
hmm.. not sure I'm doing this right as its the first time I run the llms on the CLI.
is this what you did? |
Yes, just give it a prompt of 1000 tokens. Here you can try my script:
|
This is the output the
This is finished in less than a second. I'm not sure how to read the results given, but it feels like the M3 would be enough to run this.. am I correct? I'll try to setup refact locally on my mac without docker.. it seems to be a bit complicated as I seem to have a mess with my python installations :-/ |
Hey @domdorn thanks for your results! That should be less than a second for a 2k context, maybe we'll think about official support 🤔
It's all about finetune, file filtering, efficient model hosting on GPUs. I think your best bet is to run model inside llama.cpp, add openai-style http server to it, and set up a caps file for Here's how to run it:
Then test if it works in console, using an example like this: https://github.com/smallcloudai/refact-lsp/blob/main/examples/http_completion.sh If it does, the last step is to put a path to your caps.json to address URL in the plugin. Should work, but you will be the first to try, fun! |
Hi,
I stumbled upon this project in search for a code generation plugin for IntelliJ that I could run locally on my Macbook Pro M3.
GPT4All and LM Studio work here.
When trying to start the docker container according to the readme, I'm getting the following error:
any help appreciated!
p.s.: happy new year!
The text was updated successfully, but these errors were encountered: