Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does QC3 work on mpileup/VarScan vcf files? #4

Open
rrt8 opened this issue Mar 7, 2016 · 1 comment
Open

Does QC3 work on mpileup/VarScan vcf files? #4

rrt8 opened this issue Mar 7, 2016 · 1 comment

Comments

@rrt8
Copy link

rrt8 commented Mar 7, 2016

When I ran QC3 on my exome vcf files generated through mpileup/VarScan, the consistency tables is showing very high (>0.9) for all sample pairs, meaning all samples are contaminated with each other as per your documentation.

Run command :

samtools mpileup -l illumina.bed -f genome.fa mySample_sorted_dedup.exome.bam | VarScan mpileup2cns --variants 1 --output-vcf 1 -strand-filter 0 --min-avg-qual 25 > mySample_sorted_dedup.exome.vcf

Perhaps this has to do something with the VCF format, Here is an example of VCF for one sample.Can you tell if QC3 works with this format and if and what modification should I do.

@##fileformat=VCFv4.1

source=VarScan2

INFO=<ID=ADP,Number=1,Type=Integer,Description="Average per-sample depth of bases with Phred score >= 25">

INFO=<ID=WT,Number=1,Type=Integer,Description="Number of samples called reference (wild-type)">

INFO=<ID=HET,Number=1,Type=Integer,Description="Number of samples called heterozygous-variant">

INFO=<ID=HOM,Number=1,Type=Integer,Description="Number of samples called homozygous-variant">

INFO=<ID=NC,Number=1,Type=Integer,Description="Number of samples not called">

FILTER=<ID=str10,Description="Less than 10% or more than 90% of variant supporting reads on one strand">

FILTER=<ID=indelError,Description="Likely artifact due to indel reads at this position">

FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">

FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">

FORMAT=<ID=SDP,Number=1,Type=Integer,Description="Raw Read Depth as reported by SAMtools">

FORMAT=<ID=DP,Number=1,Type=Integer,Description="Quality Read Depth of bases with Phred score >= 25">

FORMAT=<ID=RD,Number=1,Type=Integer,Description="Depth of reference-supporting bases (reads1)">

FORMAT=<ID=AD,Number=1,Type=Integer,Description="Depth of variant-supporting bases (reads2)">

FORMAT=<ID=FREQ,Number=1,Type=String,Description="Variant allele frequency">

FORMAT=<ID=PVAL,Number=1,Type=String,Description="P-value from Fisher's Exact Test">

FORMAT=<ID=RBQ,Number=1,Type=Integer,Description="Average quality of reference-supporting bases (qual1)">

FORMAT=<ID=ABQ,Number=1,Type=Integer,Description="Average quality of variant-supporting bases (qual2)">

FORMAT=<ID=RDF,Number=1,Type=Integer,Description="Depth of reference-supporting bases on forward strand (reads1plus)">

FORMAT=<ID=RDR,Number=1,Type=Integer,Description="Depth of reference-supporting bases on reverse strand (reads1minus)">

FORMAT=<ID=ADF,Number=1,Type=Integer,Description="Depth of variant-supporting bases on forward strand (reads2plus)">

FORMAT=<ID=ADR,Number=1,Type=Integer,Description="Depth of variant-supporting bases on reverse strand (reads2minus)">

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT mySample1

chrM 8702 . G A . PASS ADP=49;WT=0;HET=0;HOM=1;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 1/1:255:65:49:0:48:97.96%:1.554E-28:0:35:0:0:21:27
chrM 9378 . G A . PASS ADP=68;WT=0;HET=0;HOM=1;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 1/1:255:78:68:0:68:100%:1.6809E-40:0:35:0:0:38:30
chrM 9541 . C T . PASS ADP=53;WT=0;HET=0;HOM=1;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 1/1:255:61:53:0:53:100%:1.5943E-31:0:35:0:0:5:48
chrM 10399 . G A . PASS ADP=72;WT=0;HET=0;HOM=1;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 1/1:255:84:72:0:72:100%:6.7558E-43:0:35:0:0:39:33
chrM 10820 . G A . PASS ADP=72;WT=0;HET=0;HOM=1;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 1/1:255:91:72:0:71:98.61%:2.6835E-42:0:35:0:0:37:34
chrM 10874 . C T . PASS ADP=86;WT=0;HET=0;HOM=1;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 1/1:255:95:86:0:85:98.84%:1.0935E-50:0:35:0:0:31:54

Thank you!

@slzhao
Copy link
Owner

slzhao commented Mar 8, 2016

Hello,

QC3 was designed for GATK vcf, not designed to support varscan vcf. I can
understand that more and more persons are going to use different types of
vcf files in it. So I am going to add the support of more vcf types in the
future. But I don't have enough time to do it at this time.
If you are going to change QC3 to make it support varscan vcf, it will be
great.
Thank you!

Best,
Shilin

2016-03-07 13:41 GMT-06:00 rrt8 [email protected]:

When I ran QC3 on my exome vcf files generated through mpileup/VarScan,
the consistency tables is showing very high (>0.9) for all sample pairs,
meaning all samples are contaminated with each other as per your
documentation.

Run command :

samtools mpileup -l illumina.bed -f genome.fa
mySample_sorted_dedup.exome.bam | VarScan mpileup2cns --variants 1
--output-vcf 1 -strand-filter 0 --min-avg-qual 25 >
mySample_sorted_dedup.exome.vcf

Perhaps this has to do something with the VCF format, Here is an example
of VCF for one sample.Can you tell if QC3 works with this format and if and
what modification should I do.

@##fileformat=VCFv4.1
##source=VarScan2
##INFO== 25">
##INFO=
##INFO=
##INFO=
##INFO=
##FILTER=
##FILTER=
##FORMAT=
##FORMAT=
##FORMAT=
##FORMAT== 25">
##FORMAT=
##FORMAT=
##FORMAT=
##FORMAT=
##FORMAT=
##FORMAT=
##FORMAT=
##FORMAT=
##FORMAT=
##FORMAT=
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT mySample1
chrM 8702 . G A . PASS ADP=49;WT=0;HET=0;HOM=1;NC=0
GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR
1/1:255:65:49:0:48:97.96%:1.554E-28:0:35:0:0:21:27
chrM 9378 . G A . PASS ADP=68;WT=0;HET=0;HOM=1;NC=0
GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR
1/1:255:78:68:0:68:100%:1.6809E-40:0:35:0:0:38:30
chrM 9541 . C T . PASS ADP=53;WT=0;HET=0;HOM=1;NC=0
GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR
1/1:255:61:53:0:53:100%:1.5943E-31:0:35:0:0:5:48
chrM 10399 . G A . PASS ADP=72;WT=0;HET=0;HOM=1;NC=0
GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR
1/1:255:84:72:0:72:100%:6.7558E-43:0:35:0:0:39:33
chrM 10820 . G A . PASS ADP=72;WT=0;HET=0;HOM=1;NC=0
GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR
1/1:255:91:72:0:71:98.61%:2.6835E-42:0:35:0:0:37:34
chrM 10874 . C T . PASS ADP=86;WT=0;HET=0;HOM=1;NC=0
GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR
1/1:255:95:86:0:85:98.84%:1.0935E-50:0:35:0:0:31:54

Thank you!


Reply to this email directly or view it on GitHub
#4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants