Module A eukaryote script is identical to prokaryote script #6

martaolmu · 2024-02-02T14:35:43Z

Hi!

I would like to run the module A for an eukaryote organism (Mus musculus). For now the analysis has been running for a month (and I think it will take a lot more time).
I've been looking at both scripts (putative_orfs.sh and putative_orfs_eukaryotes.sh) and I cannot see any differences. Is it possible that there's an error and the eukaryote script is not separating the chromosomes in the fasta file and that's why it is taking so long? Would you have any recommendations?

Thanks so much!

Marta

AlexanderBartholomaeus · 2024-02-02T20:49:21Z

Hi Marta,

the problem might be the ORF detection when you try to apply it on a complete eukayotic genome. This is because there are simply too many START/STOP codons. There are ~1.2mio potential START codons in E.coli (5mio bp genome size) and on the genome of mouse I would estimate ~600mio START codons. This will likely cause the script to use a lot of RAM and get very slow.

Did you use the whole genome as input?

Please consider the following thoughts: For eukayotes the smORF analysis might be a bit more complicated because you might run the pipeline multiple times, depending on what you want to find. When you use the genome, you will detect small ORFs that cover exon-intron boundaries. These might be very likely to be false positives if the intron is splices out. You could think about using only mRNA sequences as an input. This would also reduce the search space for possible ORFs a lot. However, you would not be able to detect new ORFs that are on possibly short and new yet detected (m)RNAs.

Currently, I am trying to write a more efficient ORF detection procedure, but I not sure how long this will take (and if it is finally faster). I will also check the code regarding that multi-chromosome issue.

Best,
Alex

martaolmu · 2024-02-05T14:28:00Z

Hi Alex,

yes I did use the whole genome as input. I will consider to use only mRNA sequences as input as you say.
Thanks so much for the info and your time!

Marta

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Module A eukaryote script is identical to prokaryote script #6

Module A eukaryote script is identical to prokaryote script #6

martaolmu commented Feb 2, 2024

AlexanderBartholomaeus commented Feb 2, 2024

martaolmu commented Feb 5, 2024

Module A eukaryote script is identical to prokaryote script #6

Module A eukaryote script is identical to prokaryote script #6

Comments

martaolmu commented Feb 2, 2024

AlexanderBartholomaeus commented Feb 2, 2024

martaolmu commented Feb 5, 2024