CLIP-SF and BLIP-SF weight: w1, w2, w3, and w4. #30

pandaupc · 2025-02-14T07:55:52Z

It seems that CLIP-SF and BLIP-SF have not been trained on w1, w2, w3, and w4.
In the code for encoding in UniIR/src/models/uniir_clip/clip_scorefusion/clip_sf.py, it is as follows:
def encode_text(self, text_tensor):
return self.clip_model.encode_text(text_tensor)

def encode_image(self, image_tensor):
return self.clip_model.encode_image(image_tensor)

def fuse_embeddings(self, img_emb, txt_emb):
fused_emb = img_emb + txt_emb
return fused_emb

def encode_multimodal_input(self, txt_tensor, img_tensor, txt_mask, img_mask):
"""
:param txt_tensor:
:param img_tensor:
:param txt_mask: expected shape: [batch_size, 1]
:param img_mask: expected shape: [batch_size, 1]
:return:
"""
txt_emb = self.encode_text(txt_tensor) * txt_mask.unsqueeze(-1)
img_emb = self.encode_image(img_tensor) * img_mask.unsqueeze(-1)
return self.fuse_embeddings(txt_emb, img_emb) # shape: [batch_size,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLIP-SF and BLIP-SF weight: w1, w2, w3, and w4. #30

CLIP-SF and BLIP-SF weight: w1, w2, w3, and w4. #30

pandaupc commented Feb 14, 2025

CLIP-SF and BLIP-SF weight: w1, w2, w3, and w4. #30

CLIP-SF and BLIP-SF weight: w1, w2, w3, and w4. #30

Comments

pandaupc commented Feb 14, 2025