Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: cohere multimodal embedding not considering images #7571

Open
ladrians opened this issue Jan 5, 2025 · 1 comment
Open

[Bug]: cohere multimodal embedding not considering images #7571

ladrians opened this issue Jan 5, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@ladrians
Copy link

ladrians commented Jan 5, 2025

What happened?

I am trying to test the cohere multimodal embeddings. Based on the doc this is how to send images

{
  "model": "cohere/embed-english-v3.0",
  "input": ["<base64 encoded image>"]
}

with debug I see it is handled as text and not image

curl -X POST \
https://api.cohere.ai/v1/embed \
....
-d '{'model': 'embed-english-v3.0', 'texts': ['/9j/4AAQSkZJRgABAQAAAQABAAD//....'], 'input_type': 'search_document'}'

Based on cohere doc I used this post sample to https://api.cohere.com/v1/embed

{
    "model": "embed-english-v3.0",
    "input_type": "image", 
    "embedding_types": ["float"],
    "images": ["data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD//...."]
}

and got a correct response like this

{
    "id": "5e1c96a0-9755-4434-9bb0-bae910351928",
    "texts": [],
    "images": [
        {
            "width": 400,
            "height": 400,
            "format": "jpeg",
            "bit_depth": 24
        }
    ],
    "embeddings": {
        "float": [
            [
                -0.007247925,
                -0.041229248,
                -0.023223877,
                -0.08392334,
                ...
            ]
        ]
    },
    "meta": {
        "api_version": {
            "version": "1"
        },
        "billed_units": {
            "images": 1
        }
    },
    "response_type": "embeddings_by_type"
}

Can liteLLM API be update to translate to this usage?

Relevant log output

No response

Are you a ML Ops Team?

No

What LiteLLM version are you on ?

main-v1.55.12

Twitter / LinkedIn details

No response

@ladrians ladrians added the bug Something isn't working label Jan 5, 2025
@ladrians
Copy link
Author

ladrians commented Jan 5, 2025

the issue seems to be more general? now testing with vertex_ai/multimodalembedding@001 something like this

{
    "model":"vertex_ai/multimodalembedding@001",
    "input":["iVBORw0KGgoAAAANSUhEUgAAAGQAAABkBAMAAACCzIhnAAAAG1BMVEURAAD///..."
    ]
}

noticing is handled as text too...

-d {'instances': [{'text': 'iVBORw0KGgoAAAANSUhEUgAAAGQAAABkBAMAAACCzIhnAAAAG1BMVEURAAD///...'}]}'

based on the docs, this snippet

{
  "instances": [
    {
      "text": "TEXT",
      "image": {
        "bytesBase64Encoded": "iVBORw0KGgoAAAANSUhEUgAAAGQAAABkBAMAAACCzIhnAAAAG1BMVEURAAD///..."
      }
    }
  ]
}

worked fine

{
    "predictions": [
        {
            "textEmbedding": [
                0.00139445357,
                -0.00980720203,
                -0.00918080285,
                ...
            ],
            "imageEmbedding": [
                0.00754476804,
                0.0345284976,
                0.00910079665,
                ...
            ]
        }
    ]
}

thanks in advance for any comment on how to use the multimodal embeddings options, regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant