Support Janus-Pro-7b for vision models #8618

franz101 · 2025-01-27T20:54:56Z

Just announced and performing great with OCR
https://huggingface.co/deepseek-ai/Janus-Pro-7B

skytodmoon · 2025-01-28T00:41:43Z

Mark +1

libing64 · 2025-01-28T01:06:19Z

+1

kattatzu · 2025-01-28T01:54:49Z

+1

dengber · 2025-01-28T05:35:06Z

+1

random-zhu · 2025-01-28T08:03:31Z

Mark +1

sakujor · 2025-01-28T08:33:04Z

+1

DhairyaNxtgen · 2025-01-28T12:20:55Z

+1

TheurgicDuke771 · 2025-01-28T13:29:53Z

+1

philogicae · 2025-01-28T16:23:33Z

+1

ImranR98 · 2025-01-28T16:26:54Z

Commenting "+1" sends an unnecessary email to everyone who is subscribed to the issue. Probably a better idea to just add a thumbs up to the original post.

edgett · 2025-01-28T20:26:25Z

+1

andriy8800555355 · 2025-01-28T21:30:03Z

+1

movitecc · 2025-01-28T22:41:52Z

+1

iammrbt · 2025-01-28T23:30:22Z

+1

cmheong · 2025-01-28T23:49:58Z

+1

wwek · 2025-01-29T02:38:37Z

+1

OverStruck · 2025-01-29T04:55:57Z

+1

4austinpowers · 2025-01-29T08:34:16Z

+1

deadprogram · 2025-01-29T12:20:14Z

How about also https://huggingface.co/deepseek-ai/Janus-Pro-1B for whoever has the correct setup also to import this, please.

tobalo · 2025-01-29T19:55:26Z

+1

nurena24 · 2025-01-30T02:35:41Z

+1

xindoreen · 2025-01-30T03:34:59Z

+1

toplinuxsir · 2025-01-30T07:44:10Z

+1

zytoh0 · 2025-01-30T18:00:52Z

Just announced and performing great with OCR https://huggingface.co/deepseek-ai/Janus-Pro-7B
Not just 7B but also 1B :)
https://huggingface.co/deepseek-ai/Janus-Pro-1B
https://huggingface.co/deepseek-ai/Janus-Pro-7B

MIC-BO · 2025-01-31T00:34:49Z

+1

snt1017 · 2025-01-31T09:26:20Z

+1

jorgevespa · 2025-01-31T20:11:08Z

+1

isaacasancheza · 2025-01-31T20:12:10Z

+1

wlsoft2006 · 2025-02-01T01:11:14Z

+1

kongkang · 2025-02-01T11:54:38Z

+1

jackwang2 · 2025-02-03T00:25:57Z

+1

maddinek · 2025-02-03T08:28:15Z

+1

jangrewe · 2025-02-04T09:56:38Z

Please STOP COMMENTING +1, use the 👍 reaction to the original post instead!

philogicae · 2025-02-04T11:46:07Z

Please STOP COMMENTING +1, use the 👍 reaction to the original post instead!

No.

jangrewe · 2025-02-04T11:49:24Z

No.

What kind of special ~~idiot~~... individual are you? This is not about notifications, but about useless noise that adds nothing to the discussion.

svaningelgem · 2025-02-04T11:51:50Z

What kind of special idiot are you?

Let's keep things professional, even though other people might annoy you...

What would be most useful to me is a guide on how to create & upload such model. I'd do this myself then...

jangrewe · 2025-02-04T11:54:02Z

Let's keep things professional

fixed.

dandv · 2025-02-04T12:06:42Z

Let's keep things professional

No, but seriously, what kind of people who can:

use GitHub
are interested in a CLI tool
to run inference locally

Don't already know to NOT SPAM WITH STUPDID +1s

AND

Keep doing it after commends advising very nicely NOT TO DO SO.

Are these bots? An influx of complete and utter GitHub n00bs?

vertago1 · 2025-02-04T12:45:24Z

Let's keep things professional

No, but seriously, what kind of people who can:

use GitHub

are interested in a CLI tool

to run inference locally

Don't already know to NOT SPAM WITH STUPDID +1s

AND

Keep doing it after commends advising very nicely NOT TO DO SO.

Are these bots? An influx of complete and utter GitHub n00bs?

They must not be devs or they would realize that kind of thing leads to turning off notifications for a thread and it going off the devs radar which is counterproductive if they really want this added.

cmheong · 2025-02-05T14:20:00Z

I got to this thread because a Google search directed me here, so this is probably not the place to post this comment, so my apologies in advance to the irritable ones on the mailing list. The reason everyone is here is we want to use Janus-Pro-7b from ollama. I get it, it is not supported as of now. Now I only got ollama last week so I am definitely a newbie. I simply asked Deepseek how to run Janus-Pro-7b-LM from ollama, and the instructions it gave actually worked. I am now running it from ollama. For those who are interested, the instructions are:
Download the gguf from https://huggingface.co/mradermacher/Janus-Pro-7B-LM-GGUF/blob/main/Janus-Pro-7B-LM.Q4_K_M.gguf
Copy it to your docker ollama container. I used 'docker cp'
Make the file Modelfile in the same directory containing the line:
./Janus-Pro-7B-LM.Q4_K_M.gguf
From your docker container, run the command
ollama create janus-pro-7b-lm -f Modelfile
Then run
ollama run janus-pro-7b-lm
That is all. Have fun with janus-pro-7b. I sure am.

davrot · 2025-02-05T20:00:18Z

@cmheong Could you share the working Modelfile with us? Thanks!

jangrewe · 2025-02-05T20:08:44Z

@davrot Uhm... he says what you need to put in there? Those files are not rocket surgery, but just to make sure:

FROM  /path/to/Janus-Pro-7B-LM.Q4_K_M.gguf

For your reference: https://github.com/ollama/ollama/blob/main/docs/modelfile.md

jangrewe · 2025-02-05T22:50:11Z

@davrot Open WebUI != Ollama

sealad886 · 2025-02-06T00:31:22Z

@jangrewe Can tell me how do you send images to "ollama run janus-pro-7b-lm" ?

Multimodal Models are described in the main README.md, near the bottom.

If you're having issues with a specific non-Ollama tool/frontend that connects to the Ollama API, see the documentation for that tool separately.

davrot · 2025-02-06T07:43:12Z

ollama run janus-pro-7b-lm "What do you see in the image /data_1/deepseek/kohlfahrt0015.jpg"
?**

I don't see an image, I see a question asking me to provide information about a specific image or data file that may contain
a unique identifier and name format, possibly related to "deepseek" and "kohlfahrt". However, there is no actual visual
content associated with this request. It seems like the text contains placeholder characters, which might be due to encoding
issues or incomplete instructions. If you could provide more context or clarify what you're trying to achieve by asking
about an image or data file based on a specific name and identifier, I'd be happy to assist further!

ollama run llama3.2-vision:11b "What do you see in the image /data_1/deepseek/kohlfahrt0015.jpg"
Added image '/data_1/deepseek/kohlfahrt0015.jpg'
The image shows a group of people walking together, with trees and buildings visible in the background.

A group of people are walking together.
+ There are approximately 10 individuals in the group.
+ They appear to be walking on a sidewalk or path.
+ Some of them are looking at something off-camera, while others seem to be engaged in conversation.
The group is made up of both men and women.
+ The men are wearing casual clothing such as jeans and t-shirts.
+ The women are also dressed casually, with some wearing dresses or skirts.
They are all wearing similar jackets or coats.
+ The jackets are dark-colored and appear to be waterproof or windproof.
+ Some of the individuals have their hands in their pockets, while others are holding onto bags or other items.

Overall, the image suggests that the group is on a casual outing or hike, possibly enjoying the outdoors together.

sealad886 · 2025-02-06T09:26:35Z

Hey @davrot thanks for pasting from the shell terminal there. If you could, if would be very helpful to use the Markdown tags for indicating scripting, etc, so that that output is a bit clearer in terms of what commands you gave and what the output was, vs your own exposition (if any--based on the text, I'm assuming that's 100% LLM generated).

As another resource, you can check out the Llama3.2-Vision blog post that has usage information for that model, or the LLaVA announcement post that uses a slightly different method to interact with the model.

Overall, CLI-based multimodal interaction doesn't appear to be consistent across models. All models should be able to accept an image through the API, it seems. Refer back to those blog posts (in particular the Llama3.2-Vision one) for links to the docs.

sealad886 · 2025-02-06T09:47:39Z

It doesn't appear that the GGUF available from HF actually works.

input:

response: ollama.ChatResponse = ollama.chat(model=model, messages=[
    {
            'role': 'user',
            'contents': 'Tell me about this image.',
            'images': ['/path/to/local/image.webp']
    }
])

print(response.message.content):

 * Hello, World!</div>
        <p id="text-1" class="para">Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque eget arcu quis sapien euismod bibendum.</p>
        <p id="text-2" class="para">Nunc et orci non libero luctus convallis nec vel quam. Aliquam erat volutpat. Suspendisse sit amet ante ut nunc tristique aliquet.</p>
      </div>
    </body>
  </html>

To be fair, I don't know if the webp format is supported in this model or in the conversion to what I assume is base64, so that may be one thing causing issues here. But suffice it to say that that response is a wildly inappropriate response to the query posed.

davrot · 2025-02-06T11:20:04Z

It seems that llama.cpp is working on it:

Add supports for Janus vision encoder and projector [WIP] #11646
ggerganov/llama.cpp#11646

ravenouse · 2025-02-06T22:09:44Z

From my understanding, the current GGUF models available on Hugging Face do not include the vision encoder and projector components—only the language model. This means that the Janus model lacks image understanding when running with Ollama.

I have submitted a PR to llama.cpp and am working on adding support for the Janus vision encoder and projector. The main challenge is the customized code used by the DeepSeek team, along with potential modifications to the clip model architecture in C++. As a result, this PR may take some time to complete.

S4GU4R0 · 2025-02-08T17:29:08Z

Are these bots? An influx of complete and utter GitHub n00bs?

It seems like it, or they're literally children. Having worked with kids in an online context, enthusiasm sometimes comes across as spam and bot-like behavior.

franz101 added the feature request New feature or request label Jan 27, 2025

MathiasSchindler mentioned this issue Jan 30, 2025

Support Deepseek Janus Pro Series (7B & 1B) #8686

Closed

zytoh0 marked this as a duplicate of #8686 Jan 30, 2025

Support Janus-Pro-7b for vision models #8618

Support Janus-Pro-7b for vision models #8618

Comments

franz101 commented Jan 27, 2025

skytodmoon commented Jan 28, 2025

libing64 commented Jan 28, 2025

kattatzu commented Jan 28, 2025

dengber commented Jan 28, 2025

random-zhu commented Jan 28, 2025

sakujor commented Jan 28, 2025

DhairyaNxtgen commented Jan 28, 2025

TheurgicDuke771 commented Jan 28, 2025

philogicae commented Jan 28, 2025

ImranR98 commented Jan 28, 2025

edgett commented Jan 28, 2025

andriy8800555355 commented Jan 28, 2025

movitecc commented Jan 28, 2025

iammrbt commented Jan 28, 2025

cmheong commented Jan 28, 2025

wwek commented Jan 29, 2025

OverStruck commented Jan 29, 2025

4austinpowers commented Jan 29, 2025

deadprogram commented Jan 29, 2025

tobalo commented Jan 29, 2025

nurena24 commented Jan 30, 2025

xindoreen commented Jan 30, 2025

toplinuxsir commented Jan 30, 2025

zytoh0 commented Jan 30, 2025

MIC-BO commented Jan 31, 2025

snt1017 commented Jan 31, 2025

jorgevespa commented Jan 31, 2025

isaacasancheza commented Jan 31, 2025

wlsoft2006 commented Feb 1, 2025

kongkang commented Feb 1, 2025

jackwang2 commented Feb 3, 2025

maddinek commented Feb 3, 2025

jangrewe commented Feb 4, 2025 • edited Loading

Please STOP COMMENTING +1, use the 👍 reaction to the original post instead!

philogicae commented Feb 4, 2025

Please STOP COMMENTING +1, use the 👍 reaction to the original post instead!

jangrewe commented Feb 4, 2025 • edited Loading

svaningelgem commented Feb 4, 2025

jangrewe commented Feb 4, 2025

dandv commented Feb 4, 2025

vertago1 commented Feb 4, 2025 • edited Loading

cmheong commented Feb 5, 2025 • edited Loading

davrot commented Feb 5, 2025

jangrewe commented Feb 5, 2025 • edited Loading

jangrewe commented Feb 5, 2025

sealad886 commented Feb 6, 2025

davrot commented Feb 6, 2025

sealad886 commented Feb 6, 2025

sealad886 commented Feb 6, 2025

davrot commented Feb 6, 2025

ravenouse commented Feb 6, 2025

S4GU4R0 commented Feb 8, 2025

jangrewe commented Feb 4, 2025 •

edited

Loading

jangrewe commented Feb 4, 2025 •

edited

Loading

vertago1 commented Feb 4, 2025 •

edited

Loading

cmheong commented Feb 5, 2025 •

edited

Loading

jangrewe commented Feb 5, 2025 •

edited

Loading