normalize (sum to 1) attention score seems not right #16

jihwanp · 2022-04-24T10:05:12Z

Hi Thanks for sharing nice work.

I noticed that you've done normalizing attention score (row sum to 1) as mentioned in the original attention rollout paper.

I = torch.eye(attention_heads_fused.size(-1))
a = (attention_heads_fused + 1.0*I)/2
a = a / a.sum(dim=-1)

But it seems when dividing the summation of row attention score, keepdim=True should be apply to ensure that sum of row attention score after normalization should be 1.

a = a / a.sum(dim=-1,keepdim=True)

Maybe I'm wrong, please double check this issue.
Thanks

The text was updated successfully, but these errors were encountered:

vivekh2000 · 2024-05-16T17:13:42Z

@jacobgil , thanks for code. I think the following line in the code is redundant.

vit-explain/vit_rollout.py

Line 31 in 15a81d3

a = a / a.sum(dim=-1)

Reason:-I have attached the screenshot of the original paper below from page 3.

Here, the author said that the W_attn matrix is already normalized. When we add the identity matrix I, which is already a normalized matrix(meaning all the columns sum to one), multiplying by 0.5 makes W_attn plus I a normalized matrix.

Also at line 10 result = torch.eye(attentions[0].size(-1))

vit-explain/vit_rollout.py

Line 10 in 15a81d3

result = torch.eye(attentions[0].size(-1))

result is an identity matrix, whereas at line 33 result = torch.matmul(a, result)

vit-explain/vit_rollout.py

Line 33 in 15a81d3

result = torch.matmul(a, result)

a matrix and result matrix(an identity matrix) are getting multiplied, this should result in a always. Further, as mentioned in the original paper, recursive multiplication is not implemented. Anyway, thanks for the nice implementation of the techniques.

eneserdo · 2024-08-02T12:52:24Z

@vivekh2000 I did not check the paper, but I think @jacobgil also implemented the discard_ratio which may not be available in the paper because this obviously breaks the normalization of the matrix. So, it is necessary to re-normalize the matrix. Also, I agreed with @jihwanp, there should be keepdim=True

gbZachYin · 2024-10-08T18:46:11Z

keepdim=True should be correct

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

normalize (sum to 1) attention score seems not right #16

normalize (sum to 1) attention score seems not right #16

jihwanp commented Apr 24, 2022

vivekh2000 commented May 16, 2024 •

edited

Loading

eneserdo commented Aug 2, 2024

gbZachYin commented Oct 8, 2024

normalize (sum to 1) attention score seems not right #16

normalize (sum to 1) attention score seems not right #16

Comments

jihwanp commented Apr 24, 2022

vivekh2000 commented May 16, 2024 • edited Loading

eneserdo commented Aug 2, 2024

gbZachYin commented Oct 8, 2024

vivekh2000 commented May 16, 2024 •

edited

Loading