Skip to content

Commit

Permalink
docs: update mllama vision api issue
Browse files Browse the repository at this point in the history
  • Loading branch information
danbev committed Jan 23, 2025
1 parent eacb455 commit f3d31fa
Showing 1 changed file with 15 additions and 3 deletions.
18 changes: 15 additions & 3 deletions notes/llama.cpp/llama-3-2-vision.md
Original file line number Diff line number Diff line change
Expand Up @@ -728,9 +728,11 @@ vision encoder output Tile 3 first 10 values:
[8] = 0.368833
[9] = 0.020522
```
These look identical so the issue is not with the vision encoder output I don't
think.

Things that are different are how the image patch embeddings are handled in the
newst version. The actual embedding tensor are copied to the context like this:
One thing that is different is how the image patch embeddings are handled in the
new version. The actual embedding tensor are copied to the context like this:
```c++
struct ggml_tensor * embeddings = ggml_graph_get_tensor(gf, "mmproj");

Expand All @@ -751,7 +753,17 @@ newst version. The actual embedding tensor are copied to the context like this:
ggml_backend_sched_reset(ctx.sched);
```
In the previous version they the image patch embeddings were copied into a
vector<float> and returned.
`vector<float>` and returned.
`ngxson` opened this [disussion](https://github.com/danbev/learning-ai/discussions/8)
and pointed out that there might an issue using the embd_tensor in the way I'm
currently using it:
```
ubatch does not support "cutting" the tensor in half if it does not fit into the
physical batch limit.
```
So lets try what he suggested and set the image patch embeddings on the
batch.embd and see if we can get that to work.
_work in progress_
Expand Down

0 comments on commit f3d31fa

Please sign in to comment.