From f3d31fa0b2fa4569df0f171bfc100a0fc835460d Mon Sep 17 00:00:00 2001
From: Daniel Bevenius <daniel.bevenius@gmail.com>
Date: Thu, 23 Jan 2025 14:37:33 +0100
Subject: [PATCH] docs: update mllama vision api issue

---
 notes/llama.cpp/llama-3-2-vision.md | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/notes/llama.cpp/llama-3-2-vision.md b/notes/llama.cpp/llama-3-2-vision.md
index 6c0e404..c86c364 100644
--- a/notes/llama.cpp/llama-3-2-vision.md
+++ b/notes/llama.cpp/llama-3-2-vision.md
@@ -728,9 +728,11 @@ vision encoder output Tile 3 first 10 values:
   [8] = 0.368833
   [9] = 0.020522
 ```
+These look identical so the issue is not with the vision encoder output I don't
+think.
 
-Things that are different are how the image patch embeddings are handled in the
-newst version. The actual embedding tensor are copied to the context like this:
+One thing that is different is how the image patch embeddings are handled in the
+new version. The actual embedding tensor are copied to the context like this:
 ```c++
     struct ggml_tensor * embeddings = ggml_graph_get_tensor(gf, "mmproj");
 
@@ -751,7 +753,17 @@ newst version. The actual embedding tensor are copied to the context like this:
     ggml_backend_sched_reset(ctx.sched);
 ```
 In the previous version they the image patch embeddings were copied into a
-vector<float> and returned.
+`vector<float>` and returned.
+
+`ngxson` opened this [disussion](https://github.com/danbev/learning-ai/discussions/8)
+and pointed out that there might an issue using the embd_tensor in the way I'm
+currently using it:
+```
+ubatch does not support "cutting" the tensor in half if it does not fit into the
+physical batch limit.
+```
+So lets try what he suggested and set the image patch embeddings on the
+batch.embd and see if we can get that to work.
 
 _work in progress_