Is the truncation code in extract.py reasonable? #156

diamondgloves · 2022-01-13T03:31:52Z

diamondgloves
Jan 13, 2022

When I use ESM-1b model to extract representations of the sequences with lengths over 1024, the tokens of the sequence is supposed to be 'bos + seq + eos' with length of seq_len+2.According to the code in extract.py, toks becomes 'bos+seq[:1021]' without eos before feeding the model, could this be a reasonable input?

if args.truncate:
    toks = toks[:, :1022]
out = model(toks, repr_layers=repr_layers, return_contacts=return_contacts)

Furthermore, in ContactPredictionHead the contact map will be cropped into size of 1020*1020 with the aim at removing the bos&eos, but there is no eos in the toks.
Should the input toks of the model be as 'bos + seq[:1022] + eos' with length of 1024 after truncating?

tomsercu · 2022-03-25T17:50:02Z

tomsercu
Mar 25, 2022

Thanks for opening the issue #157, indeed the logic is incorrect.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is the truncation code in extract.py reasonable? #156

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Is the truncation code in extract.py reasonable? #156

diamondgloves Jan 13, 2022

Replies: 1 comment

tomsercu Mar 25, 2022

diamondgloves
Jan 13, 2022

tomsercu
Mar 25, 2022