Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSA Server error with RNA sequence #206

Closed
yx0516 opened this issue Dec 3, 2024 · 3 comments · Fixed by #212
Closed

MSA Server error with RNA sequence #206

yx0516 opened this issue Dec 3, 2024 · 3 comments · Fixed by #212
Assignees

Comments

@yx0516
Copy link

yx0516 commented Dec 3, 2024

Hi, when i run the prediction with RNA sequence by using MSA server(use_msa_server=True). The error happens:

Exception: MMseqs2 API is giving errors. Please confirm your input is a valid protein sequence. If error persists, please try again an hour later.

The input fasta is :

rna|A
GGGUCUCGCGGAACCGGUGAGUACACCGGAAUCCAGGAAACUGGAUUUGGGCGUGCCCCCGCGAGACC

Can the MSA server skip the RNA or DNA sequence?

@wukevin
Copy link
Contributor

wukevin commented Dec 3, 2024

I'm not able to reproduce this error; I ran chai fold --use-msa-server rna_example.fasta outdir using the following contents of rna_example.fasta:

>rna|6U8D_1
GGGUCUCGCGGAACCGGUGAGUACACCGGAAUCCAGGAAACUGGAUUUGGGCGUGCCCCCGCGAGACC
>protein|6U8D_2
EISEVQLVESGGGLVQPGGSLRLSCAASGFYISSYSIHWVRQAPGKGLEWVASIYPSYGYTSYADSVKGRFTISADTSKNTAYLQMNSLRAEDTAVYYCARRYRSYYSRYGFDYWGQGTLVTVSSASTKGPSVFPLAPSSKSTSGGTAALGCLVKDYFPEPVTVSWNSGALTSGVHTFPAVLQSSGLYSLSSVVTVPSSSLGTQTYICNVNHKPSNTKVDKKVEPKSCDKTHT
>protein|6U8D_3
SDIQMTQSPSSLSASVGDRVTITCRASQSVSSAVAWYQQKPGKAPKLLIYSASSLYSGVPSRFSGSRSGTDFTLTISSLQPEDFATYYCQQSSYYPSTFGQGTKVEIKRTVAAPSVFIFPPSDEQLKSGTASVVCLLNNFYPREAKVQWKVDNALQSGNSQESVTEQDSKDSTYSLSSTLTLSKADYEKHKVYACEVTHQGLSSPVTKSFNRGEC

Perhaps you are missing a the leading carrot > in your input so the parsing doesn't happen correctly? The logic is already skipping MSA generation for inputs that are not proteins:

chai-lab/chai_lab/chai1.py

Lines 316 to 330 in 2d2646b

protein_sequences = [
chain.entity_data.sequence
for chain in chains
if chain.entity_data.entity_type == EntityType.PROTEIN
]
msa_dir = output_dir / "msas"
msa_dir.mkdir(parents=True, exist_ok=False)
generate_colabfold_msas(
protein_seqs=protein_sequences,
msa_dir=msa_dir,
msa_server_url=msa_server_url,
)
msa_context, msa_profile_context = get_msa_contexts(
chains, msa_directory=msa_dir
)

I did realize though, when testing this, that some of the errors we log are overly verbose and do not apply to RNAs and other non-protein entities; see #209

@wukevin wukevin self-assigned this Dec 3, 2024
@yx0516
Copy link
Author

yx0516 commented Dec 4, 2024

@wukevin The above error happens when there is only RNA sequences without proteins.

@wukevin
Copy link
Contributor

wukevin commented Dec 4, 2024

Thanks for helping us narrow down this issue; we've resolved this issue in the mentioned PR. Please let us know if any other issues persist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants