InfoNCELoss seems different from ImageBind paper #10

alex6095 · 2023-11-11T06:20:54Z

          > this is handled by the masking. Notice that we set the similarity score between a sample and itself to 0. Additionally we mask one half out and keep the other for computing the similarity between the images and other (in this case, text) modalities. The positive sample is the only one corresponding to the image `batch_size/2` samples away. The remaining text samples are negative samples

`

Mask out cosine similarity to itself

        self_mask = torch.eye(
            cos_sim.shape[0], dtype=torch.bool, device=cos_sim.device)
        cos_sim.masked_fill_(self_mask, -9e15)

`

Based on your example explanation, lets say our q_i is one image. Then our positive sample is the only one text corresponding to the image batch_size/2 samples away. But negative samples are not only remaining text samples, we also add remaining image samples don't we? In this code you only mask for sample itself and leave other image samples to be added. I think this is adding q_i dot q_j (i != j), k_i dot k_j (i != j).

Originally posted by @alex6095 in #7 (comment)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InfoNCELoss seems different from ImageBind paper #10

InfoNCELoss seems different from ImageBind paper #10

alex6095 commented Nov 11, 2023

InfoNCELoss seems different from ImageBind paper #10

InfoNCELoss seems different from ImageBind paper #10

Comments

alex6095 commented Nov 11, 2023

Mask out cosine similarity to itself