-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Trio mode:phasing error #312
Comments
I'd suspect there is an issue with your markers/parents, it's possible merqury picked a bad threshold. Is this a true trio? Merqury provides some visual outputs you can use to look at what the parent marker counts look like, can you post those here? Can you also share the noseq.gfa and colors.csv file? |
Hi,
visual output file: assembly.homopolymer-compressed.noseq.gfa: assembly.colors: |
Something is very wrong with the P19 haplotype k-mers. Note how the blue line is flat and there is always a large read-only component which are in the offspring but neither parent. That shouldn't happen. To me it looks like P19 isn't related to the sample you're assembling. If not, then something must be going wrong with the counting of k-mers of that sample. This observation is consistent with the graph where all the nodes only have markers from P48 (your hap2) which is why all the genome is assigned to that sample. If you have Hi-C for this sample I'd recommend using that instead of a trio. @arangrhie, do you agree with my assessment? Any logs to confirm there isn't another issue? @cheninouc can you post the full Meryl log for P19, the output of |
Meryl log for P19:
Also, I tried 2 other assemblies and they all had the same problem:
|
Running another asm is not going to help, the issue the markers are bad so all assemblies will end up not able to sort the haplotypes since the trio information isn't valid. It seems that the initial P19 count didn't fail, though I'm suspicious of the high number of 1 and 2 count k-mers and that the first k-mer starts with a G not A (they should be sorted alphabetically). For comparison here's an output of a 35-fold human DB:
This implies P19 is so low coverage as to be noise. What is the input datatype for P48 and P19, illumina? |
Hi
|
It looks like both are below the 30x, the P48 looks like 18-ish in genomescope and P19 is lower, probably closer to 12x which is definitely borderline. It also looks like all P19 k-mers are being lost when they are intersected with the offspring. You can see P48_compress.k30.meryl is 5gb similar to P19_compress.k30.meryl at 5.7. But then, the inherited is much smaller. For P48 t's close to 1GB for child_compress.k30_and_P48_compress.k30.only.meryl but child_compress.k30_and_P19_compress.k30.only.meryl is 2 mb. So I would conclude that adding P19 coverage may help but not sure how much, it really doesn't look like P19 is a parent. I don't think verkko or meryl is doing anything wrong given this data. How confident are you in the trio? |
Hi,
For the species I studied, 0.6Gb, diploid, I used the following procedure for haplotype genome assembly:
the results:
The size of haplotype 1 and haplotype 2 is inconsistent.
Did one of these steps go wrong?
Thanks for your help!
Best regards,
Chenguang
The text was updated successfully, but these errors were encountered: