-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Janus-Pro-7b for vision models #8618
Comments
Mark +1 |
+1 |
2 similar comments
+1 |
+1 |
Mark +1 |
+1 |
3 similar comments
+1 |
+1 |
+1 |
Commenting "+1" sends an unnecessary email to everyone who is subscribed to the issue. Probably a better idea to just add a thumbs up to the original post. |
+1 |
7 similar comments
+1 |
+1 |
+1 |
+1 |
+1 |
+1 |
+1 |
How about also https://huggingface.co/deepseek-ai/Janus-Pro-1B for whoever has the correct setup also to import this, please. |
+1 |
3 similar comments
+1 |
+1 |
+1 |
|
+1 |
2 similar comments
+1 |
+1 |
+1 |
4 similar comments
+1 |
+1 |
+1 |
+1 |
Please STOP COMMENTING +1, use the 👍 reaction to the original post instead! |
What kind of special |
Let's keep things professional, even though other people might annoy you... What would be most useful to me is a guide on how to create & upload such model. I'd do this myself then... |
fixed. |
No, but seriously, what kind of people who can:
Don't already know to NOT SPAM WITH STUPDID AND Keep doing it after commends advising very nicely NOT TO DO SO. Are these bots? An influx of complete and utter GitHub n00bs? |
They must not be devs or they would realize that kind of thing leads to turning off notifications for a thread and it going off the devs radar which is counterproductive if they really want this added. |
I got to this thread because a Google search directed me here, so this is probably not the place to post this comment, so my apologies in advance to the irritable ones on the mailing list. The reason everyone is here is we want to use Janus-Pro-7b from ollama. I get it, it is not supported as of now. Now I only got ollama last week so I am definitely a newbie. I simply asked Deepseek how to run Janus-Pro-7b-LM from ollama, and the instructions it gave actually worked. I am now running it from ollama. For those who are interested, the instructions are: |
@cmheong Could you share the working Modelfile with us? Thanks! |
@davrot Uhm... he says what you need to put in there? Those files are not rocket surgery, but just to make sure:
For your reference: https://github.com/ollama/ollama/blob/main/docs/modelfile.md |
@davrot Open WebUI != Ollama |
Multimodal Models are described in the main README.md, near the bottom. If you're having issues with a specific non-Ollama tool/frontend that connects to the Ollama API, see the documentation for that tool separately. |
I don't see an image, I see a question asking me to provide information about a specific image or data file that may contain
Overall, the image suggests that the group is on a casual outing or hike, possibly enjoying the outdoors together. |
Hey @davrot thanks for pasting from the shell terminal there. If you could, if would be very helpful to use the Markdown tags for indicating scripting, etc, so that that output is a bit clearer in terms of what commands you gave and what the output was, vs your own exposition (if any--based on the text, I'm assuming that's 100% LLM generated). As another resource, you can check out the Llama3.2-Vision blog post that has usage information for that model, or the LLaVA announcement post that uses a slightly different method to interact with the model. Overall, CLI-based multimodal interaction doesn't appear to be consistent across models. All models should be able to accept an image through the API, it seems. Refer back to those blog posts (in particular the Llama3.2-Vision one) for links to the docs. |
It doesn't appear that the GGUF available from HF actually works. input: response: ollama.ChatResponse = ollama.chat(model=model, messages=[
{
'role': 'user',
'contents': 'Tell me about this image.',
'images': ['/path/to/local/image.webp']
}
])
* Hello, World!</div>
<p id="text-1" class="para">Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque eget arcu quis sapien euismod bibendum.</p>
<p id="text-2" class="para">Nunc et orci non libero luctus convallis nec vel quam. Aliquam erat volutpat. Suspendisse sit amet ante ut nunc tristique aliquet.</p>
</div>
</body>
</html> To be fair, I don't know if the webp format is supported in this model or in the conversion to what I assume is base64, so that may be one thing causing issues here. But suffice it to say that that response is a wildly inappropriate response to the query posed. |
It seems that llama.cpp is working on it:
|
From my understanding, the current GGUF models available on Hugging Face do not include the vision encoder and projector components—only the language model. This means that the Janus model lacks image understanding when running with Ollama. I have submitted a PR to llama.cpp and am working on adding support for the Janus vision encoder and projector. The main challenge is the customized code used by the DeepSeek team, along with potential modifications to the clip model architecture in C++. As a result, this PR may take some time to complete. |
It seems like it, or they're literally children. Having worked with kids in an online context, enthusiasm sometimes comes across as spam and bot-like behavior. |
Just announced and performing great with OCR
https://huggingface.co/deepseek-ai/Janus-Pro-7B
The text was updated successfully, but these errors were encountered: