You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear authors,
Regarding the experimental results in Section 4.2, I noticed that the authors compared the performance of models using SWA and models using the SelfExtend method on the passkey retrieval task. Although SWA limits the attention window size between tokens, there are many layers in the LLM. It is possible that the last token do not attend to the tokens where the passkey is located in the first layer, but the tokens at the passkey position can propagate backward to the tokens within the SWA window size. This backward propagation continues layer by layer until the final layer. Why can't the passkey information be propagated to the tokens that need to be generated? I am really curious about this question and look forward to your response!
The text was updated successfully, but these errors were encountered:
I also observed that the SelfExtend method causes tokens related to the passkey, when placed far from the last token, to have nearly the same distance to the last token. Why doesn’t this lead to the token order of the passkey being reversed in the output?
Dear authors,
Regarding the experimental results in Section 4.2, I noticed that the authors compared the performance of models using SWA and models using the SelfExtend method on the passkey retrieval task. Although SWA limits the attention window size between tokens, there are many layers in the LLM. It is possible that the last token do not attend to the tokens where the passkey is located in the first layer, but the tokens at the passkey position can propagate backward to the tokens within the SWA window size. This backward propagation continues layer by layer until the final layer. Why can't the passkey information be propagated to the tokens that need to be generated? I am really curious about this question and look forward to your response!
The text was updated successfully, but these errors were encountered: