You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For below data example with two haplotypes, each with a different overlapping indel variant, that are represented in the unphased input vcf file as three variants (a one 1 bp deletion immediately followed by a 8 bp deletion for haplotype "earlyIndel" and a 5 bp deletion for haplotype "lateDeletion") HapCUT2 (with --indel 1) leaves the the second variant unphased. Is this expected?
Fully reproducible example (using current master v1.3.1 0eb7075 and current develop of htslib: 1.18 99415e2a)
Extracting haplotype informative reads from bamfiles reads.bam minQV 13 minMQ 20 maxIS 1000
VCF file unphased.vcf has 3 variants
adding chrom chr19 to index
vcffile unphased.vcf chromosomes 1 hetvariants 3 variants 3
detected 0 variants with two non-reference alleles, these variants will not be phased
reading fasta index file chr19.fa.fai ... fasta file chr19.fa has 1 chromosomes/contigs
found match for reference contig chr19 in VCF file index
contig chr19 length 61431566
reading reference sequence file chr19.fa with 1 contigs
read reference sequence file in 0.37 sec
reading sorted bamfile reads.bam
processing reads mapped to chrom "chr19"
final cleanup of fragment list: 8 current chrom 0 prev 0
[2023:10:26 11:04:40] input fragment file: fragment_file
[2023:10:26 11:04:40] input variantfile (VCF format):unphased.vcf
[2023:10:26 11:04:40] haplotypes will be output to file: haplotype_output_file
[2023:10:26 11:04:40] solution convergence cutoff: 5
[2023:10:26 11:04:40] read 3 variants from unphased.vcf file
[2023:10:26 11:04:40] read fragment file and variant file: fragments 7 variants 3
mean number of variants per read is 2.00
[2023:10:26 11:04:40] building read-variant graph for phasing
[2023:10:26 11:04:40] no of non-trivial connected components 1 max-Degree 7 connected variants 2 coverage-per-variant 7.000000
[2023:10:26 11:04:40] comp 0 first 0 last 2 phased 2 fragments 7
[2023:10:26 11:04:40] fragments 7 snps 3 component(blocks) 1
[2023:10:26 11:04:40] starting Max-Likelihood-Cut based haplotype assembly algorithm
[2023:10:26 11:04:40] HIC ITER 0
[2023:10:26 11:04:40] PHASING ITER 0
[2023:10:26 11:04:40] component 0 length 3 phased 2 0...2
[2023:10:26 11:04:40] component 0 offset 0 length 3 phased 2 LL -0.0 cutvalue 13.909938 bestLL -0.03
[2023:10:26 11:04:40] PHASING ITER 1
[2023:10:26 11:04:40] component 0 offset 0 length 3 phased 2 LL -0.0 cutvalue 13.909938 bestLL -0.03
[2023:10:26 11:04:40] PHASING ITER 2
[2023:10:26 11:04:40] component 0 offset 0 length 3 phased 2 LL -0.0 cutvalue 13.909938 bestLL -0.03
[2023:10:26 11:04:40] PHASING ITER 3
[2023:10:26 11:04:40] component 0 offset 0 length 3 phased 2 LL -0.0 cutvalue 13.909938 bestLL -0.03
[2023:10:26 11:04:40] PHASING ITER 4
[2023:10:26 11:04:40] component 0 offset 0 length 3 phased 2 LL -0.0 cutvalue 13.909938 bestLL -0.03
[2023:10:26 11:04:40] PHASING ITER 5
[2023:10:26 11:04:40] component 0 offset 0 length 3 phased 2 LL -0.0 cutvalue 13.909938 bestLL -0.03
[2023:10:26 11:04:40] PHASING ITER 6
[2023:10:26 11:04:40] starting to post-process phased haplotypes to further improve accuracy
[2023:10:26 11:04:40] starting to output phased haplotypes
[2023:10:26 11:04:40] OUTPUTTING PRUNED HAPLOTYPE ASSEMBLY TO FILE haplotype_output_file
[2023:10:26 11:04:40] N50 haplotype length is 0.01 kilobases
[2023:10:26 11:04:40] OUTPUTTING PHASED VCF TO FILE haplotype_output_file.phased.VCF
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##ALT=<ID=NON_REF,Description="Represents any possible alternative allele not already represented at this location by REF and ALT">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##INFO=<ID=hapcut2,Number=1,Type=Integer,Description="phased by HapCUT2 or not">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
##contig=<ID=chr19,length=61431566>
##FORMAT=<ID=PS,Number=1,Type=Integer,Description="ID of Phase Set for Variant">
##FORMAT=<ID=PQ,Number=1,Type=Integer,Description="Phred QV indicating probability that this variant is incorrectly phased relative to the haplotype">
##FORMAT=<ID=PD,Number=1,Type=Integer,Description="phased Read Depth">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT CGAAGAGGTAGGTGCGAG-1
chr19 55910646 . AC A 7447.85 PASS AC=1 GT:AD:DP:PQ:PD:PS 1|0:47,35:82:100:7:55910646
chr19 55910648 . AAATCCCCC A 12412.8 PASS AC=1 GT:AD:DP:PS 0/1:47,35:82:.
chr19 55910653 . CCCCAT C 13372.2 PASS AC=1 GT:AD:DP:PQ:PD:PS 0|1:0,47:82:100:7:55910646
If I add a second * allele to the third variant to denote the overlapping deletion (as suggested by VCFv4.3) , and add --triallelic 1 I get the same results. Nor do --new_format 1 and --realign_variants 1 change the results.
The text was updated successfully, but these errors were encountered:
For below data example with two haplotypes, each with a different overlapping indel variant, that are represented in the unphased input vcf file as three variants (a one 1 bp deletion immediately followed by a 8 bp deletion for haplotype "earlyIndel" and a 5 bp deletion for haplotype "lateDeletion") HapCUT2 (with --indel 1) leaves the the second variant unphased. Is this expected?
Fully reproducible example (using current master v1.3.1 0eb7075 and current develop of htslib: 1.18
99415e2a
)If I add a second
*
allele to the third variant to denote the overlapping deletion (as suggested by VCFv4.3) , and add--triallelic 1
I get the same results. Nor do--new_format 1
and--realign_variants 1
change the results.The text was updated successfully, but these errors were encountered: