[Bug]: cohere multimodal embedding not considering images #7571

ladrians · 2025-01-05T12:22:37Z

What happened?

I am trying to test the cohere multimodal embeddings. Based on the doc this is how to send images

{
  "model": "cohere/embed-english-v3.0",
  "input": ["<base64 encoded image>"]
}

with debug I see it is handled as text and not image

curl -X POST \
https://api.cohere.ai/v1/embed \
....
-d '{'model': 'embed-english-v3.0', 'texts': ['/9j/4AAQSkZJRgABAQAAAQABAAD//....'], 'input_type': 'search_document'}'

Based on cohere doc I used this post sample to https://api.cohere.com/v1/embed

{
    "model": "embed-english-v3.0",
    "input_type": "image", 
    "embedding_types": ["float"],
    "images": ["data:image/jpeg;base64,/9j/4AAQSkZJRgABAQAAAQABAAD//...."]
}

and got a correct response like this

{
    "id": "5e1c96a0-9755-4434-9bb0-bae910351928",
    "texts": [],
    "images": [
        {
            "width": 400,
            "height": 400,
            "format": "jpeg",
            "bit_depth": 24
        }
    ],
    "embeddings": {
        "float": [
            [
                -0.007247925,
                -0.041229248,
                -0.023223877,
                -0.08392334,
                ...
            ]
        ]
    },
    "meta": {
        "api_version": {
            "version": "1"
        },
        "billed_units": {
            "images": 1
        }
    },
    "response_type": "embeddings_by_type"
}

Can liteLLM API be update to translate to this usage?

Relevant log output

No response

Are you a ML Ops Team?

No

What LiteLLM version are you on ?

main-v1.55.12

Twitter / LinkedIn details

No response

ladrians · 2025-01-05T12:43:24Z

the issue seems to be more general? now testing with vertex_ai/multimodalembedding@001 something like this

{
    "model":"vertex_ai/multimodalembedding@001",
    "input":["iVBORw0KGgoAAAANSUhEUgAAAGQAAABkBAMAAACCzIhnAAAAG1BMVEURAAD///..."
    ]
}

noticing is handled as text too...

-d {'instances': [{'text': 'iVBORw0KGgoAAAANSUhEUgAAAGQAAABkBAMAAACCzIhnAAAAG1BMVEURAAD///...'}]}'

based on the docs, this snippet

{
  "instances": [
    {
      "text": "TEXT",
      "image": {
        "bytesBase64Encoded": "iVBORw0KGgoAAAANSUhEUgAAAGQAAABkBAMAAACCzIhnAAAAG1BMVEURAAD///..."
      }
    }
  ]
}

worked fine

{
    "predictions": [
        {
            "textEmbedding": [
                0.00139445357,
                -0.00980720203,
                -0.00918080285,
                ...
            ],
            "imageEmbedding": [
                0.00754476804,
                0.0345284976,
                0.00910079665,
                ...
            ]
        }
    ]
}

thanks in advance for any comment on how to use the multimodal embeddings options, regards

ladrians added the bug Something isn't working label Jan 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: cohere multimodal embedding not considering images #7571

[Bug]: cohere multimodal embedding not considering images #7571

ladrians commented Jan 5, 2025

ladrians commented Jan 5, 2025

[Bug]: cohere multimodal embedding not considering images #7571

[Bug]: cohere multimodal embedding not considering images #7571

Comments

ladrians commented Jan 5, 2025

What happened?

Relevant log output

Are you a ML Ops Team?

What LiteLLM version are you on ?

Twitter / LinkedIn details

ladrians commented Jan 5, 2025