-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Protein FASTA file not found, exiting - happens when --name has a particular format #1089
Comments
Seems like it must be related to tbl2asn and how it parses the isolate field. You can try to run that command separately and see if you get any errors. |
The name argument is how the genes will be named it is typically XXX_NNNN where XXX is an NCBI assigned prefix and NNN will be sequential numbers from funannotate in numbering the genes. I don't think using '-' is a good idea in the name. You can pass in something simple eg As you may be predicting many of the same species it would be good to put the strain info as an argument to the --strain STRAINID in the predict and annotate steps as well. if you go to the I would look at size and content of the files in |
My suggestion to not put '-' in the locus prefix still stands. Write a simple function to remove the dashes and see. It looks like your 'failed' are succeeding in that you have protein files. I would look in the annotate_misc folder to try to make sense of what is being generated or used in the failed isolate. I still provide the fields --species and --strain/isolate when I run annotate myself so you may want to still include that
I would put the isolate/strain in '--isolate "$ISOLATE"' as well in your cmdline for predict so that is part of the final filenames. It seems like you aren't doing that in the predict step as the files are all only named genus_species |
Are you using the latest release?
1.8.17 conda
Describe the bug
The annotate module exits with "Protein FASTA file not found, exiting" error for some of my assemblies. The thing all these have in common is that the assembly name which I use as the
--name
flag has this format:Al-Tr-xx
wherexx
is a number.Sample names that worked fine:
3C-1
Al-16-NDUS
AL-84
Sample names that failed:
Al-Tr-19-2
Al-Tr-26
There appears to be something about the
--name
format that trips the annotate module up.Any idea what it could be?
Cheers.
What command did you issue?
Logfiles
OS/Install Information
The text was updated successfully, but these errors were encountered: