This repository has been archived by the owner on Aug 1, 2024. It is now read-only.
Is the truncation code in extract.py reasonable? #156
Unanswered
diamondgloves
asked this question in
Q&A
Replies: 1 comment
-
Thanks for opening the issue #157, indeed the logic is incorrect. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
When I use ESM-1b model to extract representations of the sequences with lengths over 1024, the tokens of the sequence is supposed to be 'bos + seq + eos' with length of seq_len+2.According to the code in extract.py, toks becomes 'bos+seq[:1021]' without eos before feeding the model, could this be a reasonable input?
Furthermore, in ContactPredictionHead the contact map will be cropped into size of 1020*1020 with the aim at removing the bos&eos, but there is no eos in the toks.
Should the input toks of the model be as 'bos + seq[:1022] + eos' with length of 1024 after truncating?
Beta Was this translation helpful? Give feedback.
All reactions