Skip to content

Commit

Permalink
docs: add clip vs vit and clip image preprocessing
Browse files Browse the repository at this point in the history
  • Loading branch information
danbev committed Jan 20, 2025
1 parent 47c534f commit b1aead3
Showing 1 changed file with 11 additions and 0 deletions.
11 changes: 11 additions & 0 deletions notes/vision/clip.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,3 +117,14 @@ high-level representation. Each "tower" within a model like CLIP is specialized
overarching architecture to achieve a common goal, such as aligning visual and
textual representations."


### CLIP vs VIT
CLIP and VIT are not the same thing as I understand it. CLIP which stands for
contrastive language-image pretraining can use a vision transformer to process
the images but CLIP itself is the complete concept of the training process.

### CLIP image preprocessing

* Resize/crop the images to 224x224 pixels (which is what was used in the paper)
* Normalize the pixel values to be in the range [0, 1]
* Convert to RGB format if needed

0 comments on commit b1aead3

Please sign in to comment.