Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Janus-Pro-7b for vision models #8618

Open
franz101 opened this issue Jan 27, 2025 · 50 comments
Open

Support Janus-Pro-7b for vision models #8618

franz101 opened this issue Jan 27, 2025 · 50 comments
Labels
feature request New feature or request

Comments

@franz101
Copy link

Just announced and performing great with OCR
https://huggingface.co/deepseek-ai/Janus-Pro-7B

@franz101 franz101 added the feature request New feature or request label Jan 27, 2025
@skytodmoon
Copy link

Mark +1

@libing64
Copy link

+1

2 similar comments
@kattatzu
Copy link

+1

@dengber
Copy link

dengber commented Jan 28, 2025

+1

@random-zhu
Copy link

Mark +1

@sakujor
Copy link

sakujor commented Jan 28, 2025

+1

3 similar comments
@DhairyaNxtgen
Copy link

+1

@TheurgicDuke771
Copy link

+1

@philogicae
Copy link

+1

@ImranR98
Copy link

Commenting "+1" sends an unnecessary email to everyone who is subscribed to the issue. Probably a better idea to just add a thumbs up to the original post.

@edgett
Copy link

edgett commented Jan 28, 2025

+1

7 similar comments
@andriy8800555355
Copy link

+1

@movitecc
Copy link

+1

@iammrbt
Copy link

iammrbt commented Jan 28, 2025

+1

@cmheong
Copy link

cmheong commented Jan 28, 2025

+1

@wwek
Copy link

wwek commented Jan 29, 2025

+1

@OverStruck
Copy link

+1

@4austinpowers
Copy link

+1

@deadprogram
Copy link

How about also https://huggingface.co/deepseek-ai/Janus-Pro-1B for whoever has the correct setup also to import this, please.

@tobalo
Copy link

tobalo commented Jan 29, 2025

+1

3 similar comments
@nurena24
Copy link

+1

@xindoreen
Copy link

+1

@toplinuxsir
Copy link

+1

@zytoh0
Copy link

zytoh0 commented Jan 30, 2025

Just announced and performing great with OCR https://huggingface.co/deepseek-ai/Janus-Pro-7B
Not just 7B but also 1B :)
https://huggingface.co/deepseek-ai/Janus-Pro-1B
https://huggingface.co/deepseek-ai/Janus-Pro-7B

@MIC-BO
Copy link

MIC-BO commented Jan 31, 2025

+1

2 similar comments
@snt1017
Copy link

snt1017 commented Jan 31, 2025

+1

@jorgevespa
Copy link

+1

@isaacasancheza
Copy link

+1

4 similar comments
@wlsoft2006
Copy link

+1

@kongkang
Copy link

kongkang commented Feb 1, 2025

+1

@jackwang2
Copy link

+1

@maddinek
Copy link

maddinek commented Feb 3, 2025

+1

@jangrewe
Copy link

jangrewe commented Feb 4, 2025

Please STOP COMMENTING +1, use the 👍 reaction to the original post instead!

@philogicae
Copy link

Please STOP COMMENTING +1, use the 👍 reaction to the original post instead!

No.

Image

@jangrewe
Copy link

jangrewe commented Feb 4, 2025

No.

What kind of special idiot... individual are you? This is not about notifications, but about useless noise that adds nothing to the discussion.

@svaningelgem
Copy link

What kind of special idiot are you?

Let's keep things professional, even though other people might annoy you...

What would be most useful to me is a guide on how to create & upload such model. I'd do this myself then...

@jangrewe
Copy link

jangrewe commented Feb 4, 2025

Let's keep things professional

fixed.

@dandv
Copy link

dandv commented Feb 4, 2025

Let's keep things professional

No, but seriously, what kind of people who can:

  • use GitHub
  • are interested in a CLI tool
  • to run inference locally

Don't already know to NOT SPAM WITH STUPDID +1s

AND

Keep doing it after commends advising very nicely NOT TO DO SO.

Are these bots? An influx of complete and utter GitHub n00bs?

@vertago1
Copy link

vertago1 commented Feb 4, 2025

Let's keep things professional

No, but seriously, what kind of people who can:

  • use GitHub
  • are interested in a CLI tool
  • to run inference locally

Don't already know to NOT SPAM WITH STUPDID +1s

AND

Keep doing it after commends advising very nicely NOT TO DO SO.

Are these bots? An influx of complete and utter GitHub n00bs?

They must not be devs or they would realize that kind of thing leads to turning off notifications for a thread and it going off the devs radar which is counterproductive if they really want this added.

@cmheong
Copy link

cmheong commented Feb 5, 2025

I got to this thread because a Google search directed me here, so this is probably not the place to post this comment, so my apologies in advance to the irritable ones on the mailing list. The reason everyone is here is we want to use Janus-Pro-7b from ollama. I get it, it is not supported as of now. Now I only got ollama last week so I am definitely a newbie. I simply asked Deepseek how to run Janus-Pro-7b-LM from ollama, and the instructions it gave actually worked. I am now running it from ollama. For those who are interested, the instructions are:
Download the gguf from https://huggingface.co/mradermacher/Janus-Pro-7B-LM-GGUF/blob/main/Janus-Pro-7B-LM.Q4_K_M.gguf
Copy it to your docker ollama container. I used 'docker cp'
Make the file Modelfile in the same directory containing the line:
./Janus-Pro-7B-LM.Q4_K_M.gguf
From your docker container, run the command
ollama create janus-pro-7b-lm -f Modelfile
Then run
ollama run janus-pro-7b-lm
That is all. Have fun with janus-pro-7b. I sure am.

@davrot
Copy link

davrot commented Feb 5, 2025

@cmheong Could you share the working Modelfile with us? Thanks!

@jangrewe
Copy link

jangrewe commented Feb 5, 2025

@davrot Uhm... he says what you need to put in there? Those files are not rocket surgery, but just to make sure:

FROM  /path/to/Janus-Pro-7B-LM.Q4_K_M.gguf

For your reference: https://github.com/ollama/ollama/blob/main/docs/modelfile.md

@jangrewe
Copy link

jangrewe commented Feb 5, 2025

@davrot Open WebUI != Ollama

@sealad886
Copy link

@jangrewe Can tell me how do you send images to "ollama run janus-pro-7b-lm" ?

Image

Multimodal Models are described in the main README.md, near the bottom.

If you're having issues with a specific non-Ollama tool/frontend that connects to the Ollama API, see the documentation for that tool separately.

@davrot
Copy link

davrot commented Feb 6, 2025

ollama run janus-pro-7b-lm "What do you see in the image /data_1/deepseek/kohlfahrt0015.jpg"
?**

I don't see an image, I see a question asking me to provide information about a specific image or data file that may contain
a unique identifier and name format, possibly related to "deepseek" and "kohlfahrt". However, there is no actual visual
content associated with this request. It seems like the text contains placeholder characters, which might be due to encoding
issues or incomplete instructions. If you could provide more context or clarify what you're trying to achieve by asking
about an image or data file based on a specific name and identifier, I'd be happy to assist further!

ollama run llama3.2-vision:11b "What do you see in the image /data_1/deepseek/kohlfahrt0015.jpg"
Added image '/data_1/deepseek/kohlfahrt0015.jpg'
The image shows a group of people walking together, with trees and buildings visible in the background.

  • A group of people are walking together.
    + There are approximately 10 individuals in the group.
    + They appear to be walking on a sidewalk or path.
    + Some of them are looking at something off-camera, while others seem to be engaged in conversation.
  • The group is made up of both men and women.
    + The men are wearing casual clothing such as jeans and t-shirts.
    + The women are also dressed casually, with some wearing dresses or skirts.
  • They are all wearing similar jackets or coats.
    + The jackets are dark-colored and appear to be waterproof or windproof.
    + Some of the individuals have their hands in their pockets, while others are holding onto bags or other items.

Overall, the image suggests that the group is on a casual outing or hike, possibly enjoying the outdoors together.

@sealad886
Copy link

Hey @davrot thanks for pasting from the shell terminal there. If you could, if would be very helpful to use the Markdown tags for indicating scripting, etc, so that that output is a bit clearer in terms of what commands you gave and what the output was, vs your own exposition (if any--based on the text, I'm assuming that's 100% LLM generated).

As another resource, you can check out the Llama3.2-Vision blog post that has usage information for that model, or the LLaVA announcement post that uses a slightly different method to interact with the model.

Overall, CLI-based multimodal interaction doesn't appear to be consistent across models. All models should be able to accept an image through the API, it seems. Refer back to those blog posts (in particular the Llama3.2-Vision one) for links to the docs.

@sealad886
Copy link

It doesn't appear that the GGUF available from HF actually works.

input:

response: ollama.ChatResponse = ollama.chat(model=model, messages=[
    {
            'role': 'user',
            'contents': 'Tell me about this image.',
            'images': ['/path/to/local/image.webp']
    }
])

print(response.message.content):

 * Hello, World!</div>
        <p id="text-1" class="para">Lorem ipsum dolor sit amet, consectetur adipiscing elit. Pellentesque eget arcu quis sapien euismod bibendum.</p>
        <p id="text-2" class="para">Nunc et orci non libero luctus convallis nec vel quam. Aliquam erat volutpat. Suspendisse sit amet ante ut nunc tristique aliquet.</p>
      </div>
    </body>
  </html>

To be fair, I don't know if the webp format is supported in this model or in the conversion to what I assume is base64, so that may be one thing causing issues here. But suffice it to say that that response is a wildly inappropriate response to the query posed.

@davrot
Copy link

davrot commented Feb 6, 2025

It seems that llama.cpp is working on it:

Add supports for Janus vision encoder and projector [WIP] #11646
ggerganov/llama.cpp#11646

@ravenouse
Copy link

From my understanding, the current GGUF models available on Hugging Face do not include the vision encoder and projector components—only the language model. This means that the Janus model lacks image understanding when running with Ollama.

I have submitted a PR to llama.cpp and am working on adding support for the Janus vision encoder and projector. The main challenge is the customized code used by the DeepSeek team, along with potential modifications to the clip model architecture in C++. As a result, this PR may take some time to complete.

@S4GU4R0
Copy link

S4GU4R0 commented Feb 8, 2025

Are these bots? An influx of complete and utter GitHub n00bs?

It seems like it, or they're literally children. Having worked with kids in an online context, enthusiasm sometimes comes across as spam and bot-like behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

No branches or pull requests