You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi!
I need to assemble more than 30 yeast genomes that have been sequenced
with PacBio Sequel IIe (12.1 Mb each with roughly 40x coverage).
According to the reference genome (Saccharomyces cerevisiae S288C) the
chromosomes are 16 but I get a much higher number of contigs (70 - 100)
after assembling with Canu (see below for the full commands). Could
this be a problem related to repeated sequences that the algorithm
cannot solve? If so, should I adjust the parameters to refine the
assembly? Alternatively, is there any (or set of) parameter/s that I can
tweak to get a more congruent number of chromosomes?
These are the commands that I used to get the full assembly:
You would almost never get the number of chromosomes as the number of contigs. There are always extrachromosomal sequences (e.g. mito) that can appear in multiple contigs due to high copy number and there may be low-frequency variation in the sampled population and/or repeats that are larger than HiFi reads can resolve. Are these all haploid as well, since diploid genomes would also at least double the number of contigs. I recommend instead looking at length of the longest contigs vs chromosome and seeing how many pieces per chromosome you have.
There isn't really a parameter to resolve repeats if the reads are too short/don't contain enough information to resolve it. You could use the defaults which would just drop the error rate from 1.5% to 1% but I doubt it would make much difference. Can you post the full asm.report from your run to get an idea of the assembly statistics?
Hi!
I need to assemble more than 30 yeast genomes that have been sequenced
with PacBio Sequel IIe (12.1 Mb each with roughly 40x coverage).
According to the reference genome (Saccharomyces cerevisiae S288C) the
chromosomes are 16 but I get a much higher number of contigs (70 - 100)
after assembling with Canu (see below for the full commands). Could
this be a problem related to repeated sequences that the algorithm
cannot solve? If so, should I adjust the parameters to refine the
assembly? Alternatively, is there any (or set of) parameter/s that I can
tweak to get a more congruent number of chromosomes?
These are the commands that I used to get the full assembly:
canu -p canu_assemble -d canu_assembly genomeSize=12m maxThreads=14 correctedErrorRate=0.015 minReadLength=1000 minOverlapLength=500 saveOverlaps=true -pacbio-hifi rawdata_file.fasta.gz
canu version 2.2
OS Ubuntu 22.04.4 LTS
The text was updated successfully, but these errors were encountered: