This repository has been archived by the owner on Aug 1, 2024. It is now read-only.
how much pre-training data is needed to get a decent Long_P@L score for unsupervised contact prediction? #277
Unanswered
zhenyuhe00
asked this question in
Q&A
Replies: 1 comment
-
How are you sampling those 500k sequences? Are they chosen from UR50 clusters or just arbitrarily chosen? If you've trained on a much smaller set of less diverse sequences this could be the reason for your worse performance. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi,
Congrats again on your great work!
I use the pre-trained Bert base checkpoint esm1_t12_85M_UR50S pretrained on over 20 million sequences you guys released and test its unsupervised contact prediction performance. The Long_P@L of it is about 0.20~0.30.
However, I also pre-trained a Bert-base of 85M parameters but the Long_P@L of it on the same test dataset is less than 0.05. The difference between my Bert base and your esm1_t12_85M_UR50S is that my Bert base is post norm, crop size 384, pretrained on 0.5 million sequences (esm1_t12_85M_UR50S is pre norm, crop size 1024, pretrained on over 20million sequences).
I wonder why my Bert base is much worse than your Bert base, is it because of the different amount of pretraining sequences?
Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions