This repository has been archived by the owner on Aug 1, 2024. It is now read-only.
Pooling strategy to obtain Protein Level Embeddings from esm-2 and esm-if #591
Unanswered
harshagrawal13
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am trying to train a Siamese neural network to create joint embeddings for protein sequences (generated by esm2) and protein structures (generated by esm_if). My aim is that the unified embedding of an entire protein, both sequential and structural, should be similar.
I want to gain more insight into which pooling strategy to use for both esm_if and esm2 to go from residue level representations (Batch size * Num Residues * Embedding Size) to protein level representations (Batch size * Embedding Size).
I've tried mean pooling but that seems to eat away a lot of useful information. Is it wise to use BOS token embedding or any other pooling strategies?
Beta Was this translation helpful? Give feedback.
All reactions