docs: update vision/clip.md

danbev · Jan 20, 2025 · 9c281cb · 9c281cb
1 parent 79f1694
commit 9c281cb
Showing 1 changed file with 4 additions and 0 deletions.
diff --git a/notes/vision/clip.md b/notes/vision/clip.md
@@ -122,6 +122,10 @@ textual representations."
 CLIP and VIT are not the same thing as I understand it. CLIP which stands for
 contrastive language-image pretraining can use a vision transformer to process
 the images but CLIP itself is the complete concept of the training process.
+At inference time, like what is being worked on in llama.cpp, it is the ViT
+component of CLIP thas being used and perhaps using clip as the prefix for
+methods/structs/tensor can be a litle misleading. Would this be better off
+named something like 'vit' or `vision_model`, `vision_layer` etc.
 
 ### CLIP image preprocessing