You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Human Decoy Sequences (hs37d5) prepared according to [README]
contains ambiguous IUPAC references.
In particular it has some S (C or G).
The VCF 4.3 specs prescribe that a caller outputs the first base, but not the IUPAC ambiguous base:
REF - reference base(s): Each base must be one of A,C,G,T,N (case insensitive).
[...]
If the reference sequence contains IUPAC ambiguity codes not allowed by this specification (such as R = A/G), the ambiguous reference base must be reduced to a concrete base by using the one that is first alphabetically (thus R as a reference base is converted to A in VCF.)
For example, I should not be seeing a call such as:
hs37d5 33184489 . S C . PASS DP=33;SS=1;SSC=0;GPV=1.3852e-19;SPV=1 GT:GQ:DP:RD:AD:FREQ:DP4 1/1:.:10:0:10:100%:0,0,9,1 1/1:.:23:0:23:100%:0,0,23,0
Here the S should have been interpreted as C. Therefore there should have been no call.
This bug causes false positives and, more importantly, causes downstream tools to fail, such as igvtools
The text was updated successfully, but these errors were encountered:
Just weighing in as another guy bit by the same dog (VarScan2). Errors were encountered in Picard FixVcfHeader, which I have to use to fix other non-compliant VarScan2-generated VCFs.
Exception in thread "main" htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 161245: unparsable vcf record with allele R, for input source: file:///home/dnanexus/inputs/input4444483750468838029/LRF2-10_ST@1212020JH_ST.b37.map.dedup.sample.varscan2.fpfilter.fail.add_contigs.vcf
Exception in thread "main" htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 128832: unparsable vcf record with allele Y, for input source: file:///home/dnanexus/inputs/input1153832023365751001/MP17-00247-NT_K@03102020JH_ST.b37.map.dedup.sample.varscan2.fpfilter.fail.add_contigs.vcf
Exception in thread "main" htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 101179: unparsable vcf record with allele S, for input source: file:///home/dnanexus/inputs/input8947911205436031285/MG18-2516@10212020JH_ST.b37.map.dedup.sample.varscan2.fpfilter.fail.add_contigs.vcf
The Human Decoy Sequences (hs37d5) prepared according to [README]
contains ambiguous IUPAC references.
In particular it has some S (C or G).
The VCF 4.3 specs prescribe that a caller outputs the first base, but not the IUPAC ambiguous base:
For example, I should not be seeing a call such as:
Here the S should have been interpreted as C. Therefore there should have been no call.
This bug causes false positives and, more importantly, causes downstream tools to fail, such as igvtools
The text was updated successfully, but these errors were encountered: