From 73dd2d715ffe9af957cf2fcea5ab4b74a5eb6b2b Mon Sep 17 00:00:00 2001
From: Daniel Bevenius <daniel.bevenius@gmail.com>
Date: Wed, 22 Jan 2025 14:14:34 +0100
Subject: [PATCH] docs: add note about copying output after processing

---
 notes/llama.cpp/llama-3-2-vision.md | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/notes/llama.cpp/llama-3-2-vision.md b/notes/llama.cpp/llama-3-2-vision.md
index 0f4b61d..451f9d5 100644
--- a/notes/llama.cpp/llama-3-2-vision.md
+++ b/notes/llama.cpp/llama-3-2-vision.md
@@ -188,6 +188,27 @@ token = 271
 We can now see that the `<|image|>` token is correctly resolved to the correct
 token id 128256.
 
+So that worked when I inspected the tokens which is great. But after processing
+the output will be copied:
+```c++
+            if (n_outputs_new) {
+                GGML_ASSERT( n_outputs_prev + n_outputs_new <= n_outputs);
+                GGML_ASSERT((n_outputs_prev + n_outputs_new)*n_vocab <= (int64_t) lctx.logits_size);
+                ggml_backend_tensor_get_async(backend_res, res, logits_out, 0, n_outputs_new*n_vocab*sizeof(float));
+            }
+```
+```console
+(gdb) p res->ne
+$4 = {128256, 1, 1, 1}
+(gdb) p n_vocab
+$7 = 128257
+```
+In this case the above call will cause an error:
+```console
+/danbev/work/ai/new-vision-api/ggml/src/ggml-backend.cpp:245:
+GGML_ASSERT(offset + size <= ggml_nbytes(tensor) && "tensor read out of bounds") failed
+```
+
 ### Model conversion
 So we first need to convert the model to GGUF format which is done by the
 `convert_hf_to_gguf.py` script. This model consists of not just one model but