Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ambiguous allele (UIPAC) not properly handled according to VCF specs 4.3 #43

Open
huguesfontenelle opened this issue Apr 5, 2019 · 2 comments

Comments

@huguesfontenelle
Copy link

The Human Decoy Sequences (hs37d5) prepared according to [README]
contains ambiguous IUPAC references.
In particular it has some S (C or G).

The VCF 4.3 specs prescribe that a caller outputs the first base, but not the IUPAC ambiguous base:

REF - reference base(s): Each base must be one of A,C,G,T,N (case insensitive).
[...]
If the reference sequence contains IUPAC ambiguity codes not allowed by this specification (such as R = A/G), the ambiguous reference base must be reduced to a concrete base by using the one that is first alphabetically (thus R as a reference base is converted to A in VCF.)

For example, I should not be seeing a call such as:

hs37d5  33184489        .       S       C       .       PASS    DP=33;SS=1;SSC=0;GPV=1.3852e-19;SPV=1   GT:GQ:DP:RD:AD:FREQ:DP4 1/1:.:10:0:10:100%:0,0,9,1 1/1:.:23:0:23:100%:0,0,23,0

Here the S should have been interpreted as C. Therefore there should have been no call.

This bug causes false positives and, more importantly, causes downstream tools to fail, such as igvtools

@NevenaaNikolic
Copy link

NevenaaNikolic commented Oct 6, 2020

I am experiencing the same issue. How to overcome this problem, is it going to be fixed at some time in the future?

@myourshaw
Copy link

myourshaw commented Feb 25, 2022

Just weighing in as another guy bit by the same dog (VarScan2). Errors were encountered in Picard FixVcfHeader, which I have to use to fix other non-compliant VarScan2-generated VCFs.

Exception in thread "main" htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 161245: unparsable vcf record with allele R, for input source: file:///home/dnanexus/inputs/input4444483750468838029/LRF2-10_ST@1212020JH_ST.b37.map.dedup.sample.varscan2.fpfilter.fail.add_contigs.vcf

Exception in thread "main" htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 128832: unparsable vcf record with allele Y, for input source: file:///home/dnanexus/inputs/input1153832023365751001/MP17-00247-NT_K@03102020JH_ST.b37.map.dedup.sample.varscan2.fpfilter.fail.add_contigs.vcf

Exception in thread "main" htsjdk.tribble.TribbleException: The provided VCF file is malformed at approximately line number 101179: unparsable vcf record with allele S, for input source: file:///home/dnanexus/inputs/input8947911205436031285/MG18-2516@10212020JH_ST.b37.map.dedup.sample.varscan2.fpfilter.fail.add_contigs.vcf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants