You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to run the module A for an eukaryote organism (Mus musculus). For now the analysis has been running for a month (and I think it will take a lot more time).
I've been looking at both scripts (putative_orfs.sh and putative_orfs_eukaryotes.sh) and I cannot see any differences. Is it possible that there's an error and the eukaryote script is not separating the chromosomes in the fasta file and that's why it is taking so long? Would you have any recommendations?
Thanks so much!
Marta
The text was updated successfully, but these errors were encountered:
the problem might be the ORF detection when you try to apply it on a complete eukayotic genome. This is because there are simply too many START/STOP codons. There are ~1.2mio potential START codons in E.coli (5mio bp genome size) and on the genome of mouse I would estimate ~600mio START codons. This will likely cause the script to use a lot of RAM and get very slow.
Did you use the whole genome as input?
Please consider the following thoughts: For eukayotes the smORF analysis might be a bit more complicated because you might run the pipeline multiple times, depending on what you want to find. When you use the genome, you will detect small ORFs that cover exon-intron boundaries. These might be very likely to be false positives if the intron is splices out. You could think about using only mRNA sequences as an input. This would also reduce the search space for possible ORFs a lot. However, you would not be able to detect new ORFs that are on possibly short and new yet detected (m)RNAs.
Currently, I am trying to write a more efficient ORF detection procedure, but I not sure how long this will take (and if it is finally faster). I will also check the code regarding that multi-chromosome issue.
Hi!
I would like to run the module A for an eukaryote organism (Mus musculus). For now the analysis has been running for a month (and I think it will take a lot more time).
I've been looking at both scripts (putative_orfs.sh and putative_orfs_eukaryotes.sh) and I cannot see any differences. Is it possible that there's an error and the eukaryote script is not separating the chromosomes in the fasta file and that's why it is taking so long? Would you have any recommendations?
Thanks so much!
Marta
The text was updated successfully, but these errors were encountered: