-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2 chromosomal fusions from verkko 2.2 assembly not supported by the data #294
Comments
I am not sure what would cause this kind of fusion unless at least some of the reads support it. Can you share the noseq.gfa, paths.tsv, colors.tsv, and scfmap files from this asm? |
Hello, I have added the files here: https://drive.google.com/drive/folders/1-RPypEsl85qqm9B8Gy3ILG_oaD4aq5S5?usp=drive_link |
The assembly graph is extremely fragmented, much more than I would expect given the coverage. Is this a regular clonal sample? There are over 12k nodes whereas normally we see 2-3k and there is one component connecting most of the chromosomes, also something we don't normally see. I think the joins in question are from Hi-C scaffolding getting confused by the graph structure but the fundamental issue is the very low quality of the graph. |
Hello, this sample is prepared from PBMCs purified from whole blood and we assume it to be a regular clonal sample. Specifically, it is a normal control for a tumor normal pair (but we have not yet generated sequence data for the tumor). We are investigating if there was anything unusual about this sample during the sequencing preps. Is is possible to see which data type is contributing to the connected component in the graph(e.g. HiFi, ONT, and/or Hi-C)? |
The base graph is built with HiFi and then resolved with ONT. I'd suspect low coverage or short ONT data. What's your N50 of the ONT data and total coverage vs coverage >100kb? |
Hello, the total ONT coverage is 35x, the coverage over 100kb is 8x. The N50 is around 60kbp. It looks like the ONT data was from the ligation sequencing kit SQK-LSK114 and not the ultra long kit. |
Ah, that likely is the reason for the complex graph. The ligation prep isn't going to have a long UL tail and 8x total coverage (so 4x/haplotype) is likely not enough. The best solution is probably more ONT UL data. @Dmitry-Antipov is there a scaffold log that would let us know if this join is from Hi-C scaffolding or something else in the graph that you want to take a look at? |
Yes, I'm very curious. I understand that scaffolding with a fragmented graph is a difficult problem, but I would hope that it wouldn't put two pieces together that shouldn't be put together. Thanks for taking a look. |
Hi, Seems that for your graph for some reasons telomeres are often connected to the middle of other chromosomes, and we do not see this things normally. Can you also share 8-hicPipeline/unitigs.telo ? |
Hello, I have added unitigs.telo to the folder. |
So those fusions are similar. We can further explore why this happens, but this may be tricky. Last file contains all node sequences and we need only the graph structure. So I'd suggest to replace all the sequences (third element in lines starting with S) with * to simplify sharing and to exclude potential privacy problems. But possibly the simplest solution will be just generate more ONT data and reassemble. |
Hello,
I have a recent verkko 2.2 assembly with two chromosomal fusions (chr5-chr17, chr8-chr11). However, these translocations are not supported by the long reads or the hic data, from what I can tell. Would you be able to take a look at the assembly? Let me know what files you need. This sample has 40x hifi, 35x ONT-UL, and 25x hi-c data. One of the fusions looks to be a telomere-telomere fusion, and the other looks to be a centromere-telomere fusion. The breakpoints in question are approximately haplotype2-0000066:84375000 and haplotype1-0000025:133700000.
The text was updated successfully, but these errors were encountered: