Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The genotypes do not reflect the allele coverage frequency #2352

Open
pailloufat-stack opened this issue Jan 9, 2025 · 0 comments
Open

The genotypes do not reflect the allele coverage frequency #2352

pailloufat-stack opened this issue Jan 9, 2025 · 0 comments

Comments

@pailloufat-stack
Copy link

pailloufat-stack commented Jan 9, 2025

Dear,

I have 384 bam aligned samples from GBS technology. I ran the Thermofish Variant Caller (TVC) on them in order to get the genotypes of my variants. That seemed to work well but I wanted to use bcftools in order to compare both vcf files.

Here are 4 variants called with TVC at a specific locus for one single sample :

h2tg000001l     385471  A       G       0/1
h2tg000001l     385474  A       G       0/1
h2tg000001l     385508  G       A       0/1
h2tg000001l     385517  A       G       1/1

I ran bcftools on that sample with :

bcftools mpileup \
 --threads 128 \
 -q 10 -Q 30 --max-depth 1000 \
 --skip-indels \
 -f reference.fasta sample1.bam \
 --annotate FORMAT/AD,FORMAT/DP,INFO/AD | \
 bcftools call \
 --threads 128 \
 -mv -Oz -o sample1.vcf.gz

I usually use -q 10 -Q 30 to take in account only the robust variants, and --max-depth 1000 because some loci have high coverage. I got :

h2tg000001l   385471    A    G      1/1
h2tg000001l   385517    A    G      1/1

The differencies between TVC and bcftools :

  • 4 variants detected by TVC, 2 by bcftools
  • 2 are common. 1 out of the 2 has a different genotype

Here are the alignements at the locus :

image

There are the four potential variants (in the coverage track), and the black line representing the MAPQ10 threshold. bcftools removes the <MAPQ10 (-q 10) and then consider only the variants 1 and 4, as homozygous. I tried to trick the parameters of TVC to remove the <MAPQ10 alignments but nothing happens, and I got the same genotypes.

The variant 1 for example is given as 0/1 for TVC and 1/1 for bcftools. I did a test with -q 0 -Q 0 to consider all the reads. bcftools read the allele depth as 50%/50% for the variant 1 (then 0/1), but still outputed it as 1/1. Does bcftools do not consider the low mapq alignments even if I set up -q 10 ?

Also, in your opinion, do the soft-clipped reads could correspond to orthologs/paralogs that bcftools do not consider? Because we can see they could carry potentiel alleles at the variants 2 and 3.

Best

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant