Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All avg_dxy in pixy_dxy.txt is NA #114

Open
ala98412 opened this issue Jul 19, 2024 · 4 comments
Open

All avg_dxy in pixy_dxy.txt is NA #114

ala98412 opened this issue Jul 19, 2024 · 4 comments
Labels
help wanted Extra attention is needed

Comments

@ala98412
Copy link

Hi there,

I am using pixy to calculate dxy.
However, all of the avg_dxy is NA. I believe that I may prepared my file incorrectly.
Please forgive me if my question is too naive.

This is how I prepared my merge VCF:
bcftools mpileup -f ../ref.genome.fa -b ./BAM.list -r HiC_scaffold_39 | bcftools call -m -Oz -f GQ -o merged.vcf.gz

Base on the mannual, I believe I generated an all site VCF file.
All bam files had been sorted, and I only use HiC_scaffold_39 scaffold (length=173364) for testing.

Next, I indexing my vcf.gz file by tabix:
tabix merged.Zp2.Oe2.vcf.gz

And here is my population file:

Zp.Hy926	Zp
Zp.RF310d	Zp
Oe.HS062501	Oe
Oe.HS062507	Oe

and my command of using pixy:

VCF=merged.vcf.gz

pixy --stats pi fst dxy \
--vcf $VCF \
--populations popmap.txt \
--window_size 10000 \
--n_cores 4

After running the above command, I got the follow output.
It seems like I have some invalid lines, but I don't know how to solve it.

[pixy] pixy 1.2.10.beta2
[pixy] See documentation at https://pixy.readthedocs.io/en/latest/
/home/why/.local/lib/python3.8/site-packages/allel/io/vcf_read.py:1732: UserWarning: invalid INFO header: '##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3">\n'
  warnings.warn('invalid INFO header: %r' % header)

[pixy] Validating VCF and input parameters...
[pixy] Checking write access...OK
[pixy] Checking CPU configuration...OK
[pixy] Checking for invariant sites...OK
[pixy] Checking chromosome data...OK
[pixy] Checking intervals/sites...OK
[pixy] Checking sample data...OK
[pixy] All initial checks past!

[pixy] Preparing for calculation of summary statistics: pi, fst, dxy
[pixy] Using Weir and Cockerham (1984)'s estimator of FST.
[pixy] Data set contains 2 population(s), 1 chromosome(s), and 4 sample(s)
[pixy] Window size: 10000 bp

[pixy] Started calculations at 17:15:03 on 2024-07-19
[pixy] Using 4 out of 88 available CPU cores

[pixy] Processing chromosome/contig HiC_scaffold_39...
[pixy] Calculating statistics for region HiC_scaffold_39:1-173364...
/home/why/.local/lib/python3.8/site-packages/allel/io/vcf_read.py:1732: UserWarning: invalid INFO header: '##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3">\n'
  warnings.warn('invalid INFO header: %r' % header)
/home/why/.local/lib/python3.8/site-packages/allel/io/vcf_read.py:1732: UserWarning: invalid INFO header: '##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias for filtering splice-site artefacts in RNA-seq data (bigger is better)",Version="3">\n'
  warnings.warn('invalid INFO header: %r' % header)
/home/why/.local/lib/python3.8/site-packages/allel/io/vcf_read.py:1248: UserWarning: 'DP' FORMAT header not found
  warnings.warn('%r FORMAT header not found' % name)
/home/why/.local/lib/python3.8/site-packages/allel/io/vcf_read.py:1248: UserWarning: 'DP' FORMAT header not found
  warnings.warn('%r FORMAT header not found' % name)

[pixy] WARNING: pixy failed to find any valid gentoype data to calculate the following summary statistics: fst. No output file will be created for these statistics.

[pixy] All calculations complete at 17:15:14 on 2024-07-19
[pixy] Time elapsed: 00:00:10
[pixy] Output files written to: /home/why/Juihung/Candidia_barbatus_newAssembly2/gIMble/1_mapping/pixy.Zp2.Oe2/

[pixy] If you use pixy in your research, please cite the following paper:
[pixy] Korunes, KL and K Samuk. pixy: Unbiased estimation of nucleotide diversity and divergence in the presence of missing data. Mol Ecol Resour. 2021 Jan 16. doi: 10.1111/1755-0998.13326.

Here is avg_dxy file, pixy didn't find any difference:

pop1	pop2	chromosome	window_pos_1	window_pos_2	avg_dxy	no_sites	count_diffs	count_comparisons	count_missing
Zp	Oe	HiC_scaffold_39	1	10000	NA	0	0	0	159440
Zp	Oe	HiC_scaffold_39	10001	20000	NA	0	0	0	157280
Zp	Oe	HiC_scaffold_39	20001	30000	NA	0	0	0	153040
Zp	Oe	HiC_scaffold_39	30001	40000	NA	0	0	0	114432
Zp	Oe	HiC_scaffold_39	40001	50000	NA	0	0	0	157168
Zp	Oe	HiC_scaffold_39	50001	60000	NA	0	0	0	155040
Zp	Oe	HiC_scaffold_39	60001	70000	NA	0	0	0	156320
Zp	Oe	HiC_scaffold_39	70001	80000	NA	0	0	0	132656
Zp	Oe	HiC_scaffold_39	80001	90000	NA	0	0	0	135696
Zp	Oe	HiC_scaffold_39	90001	100000	NA	0	0	0	139952
Zp	Oe	HiC_scaffold_39	100001	110000	NA	0	0	0	146944
Zp	Oe	HiC_scaffold_39	110001	120000	NA	0	0	0	155536
Zp	Oe	HiC_scaffold_39	120001	130000	NA	0	0	0	157152
Zp	Oe	HiC_scaffold_39	130001	140000	NA	0	0	0	157872
Zp	Oe	HiC_scaffold_39	140001	150000	NA	0	0	0	159280
Zp	Oe	HiC_scaffold_39	150001	160000	NA	0	0	0	158800
Zp	Oe	HiC_scaffold_39	160001	170000	NA	0	0	0	159856
Zp	Oe	HiC_scaffold_39	170001	180000	NA	0	0	0	53840

This is my VCF file looks like, I can't find any problems.

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  Zp.Hy926        Zp.RF310d       Oe.HS062501     Oe.HS062507
HiC_scaffold_39 14      .       G       .       57.449  .       DP=1;FS=0;MQ0F=0;AN=2;DP4=1,0,0,0;MQ=60 GT      0/0     ./.     ./.     ./.
HiC_scaffold_39 15      .       G       .       37.4495 .       DP=1;FS=0;MQ0F=0;AN=2;DP4=1,0,0,0;MQ=60 GT      0/0     ./.     ./.     ./.
HiC_scaffold_39 16      .       A       .       66.4491 .       DP=2;FS=0;MQ0F=0;AN=2;DP4=2,0,0,0;MQ=60 GT      0/0     ./.     ./.     ./.
HiC_scaffold_39 17      .       A       .       83.4493 .       DP=2;FS=0;MQ0F=0;AN=2;DP4=2,0,0,0;MQ=60 GT      0/0     ./.     ./.     ./.
HiC_scaffold_39 18      .       T       .       88.4494 .       DP=2;FS=0;MQ0F=0;AN=2;DP4=2,0,0,0;MQ=60 GT      0/0     ./.     ./.     ./.
HiC_scaffold_39 19      .       A       .       96.4495 .       DP=2;FS=0;MQ0F=0;AN=2;DP4=2,0,0,0;MQ=60 GT      0/0     ./.     ./.     ./.
HiC_scaffold_39 20      .       T       .       96.4495 .       DP=2;FS=0;MQ0F=0;AN=2;DP4=2,0,0,0;MQ=60 GT      0/0     ./.     ./.     ./.
HiC_scaffold_39 21      .       T       .       100.45  .       DP=2;FS=0;MQ0F=0;AN=2;DP4=2,0,0,0;MQ=60 GT      0/0     ./.     ./.     ./.
HiC_scaffold_39 22      .       G       .       100.45  .       DP=2;FS=0;MQ0F=0;AN=2;DP4=2,0,0,0;MQ=60 GT      0/0     ./.     ./.     ./.
HiC_scaffold_39 23      .       A       .       100.45  .       DP=2;FS=0;MQ0F=0;AN=2;DP4=2,0,0,0;MQ=60 GT      0/0     ./.     ./.     ./.
HiC_scaffold_39 24      .       C       .       100.45  .       DP=2;FS=0;MQ0F=0;AN=2;DP4=2,0,0,0;MQ=60 GT      0/0     ./.     ./.     ./.
HiC_scaffold_39 25      .       A       .       100.45  .       DP=2;FS=0;MQ0F=0;AN=2;DP4=2,0,0,0;MQ=60 GT      0/0     ./.     ./.     ./.
HiC_scaffold_39 26      .       C       .       100.45  .       DP=2;FS=0;MQ0F=0;AN=2;DP4=2,0,0,0;MQ=60 GT      0/0     ./.     ./.     ./.
HiC_scaffold_39 27      .       T       .       121.45  .       DP=3;FS=0;MQ0F=0;AN=2;DP4=3,0,0,0;MQ=58 GT      0/0     ./.     ./.     ./.
HiC_scaffold_39 28      .       T       .       121.45  .       DP=3;FS=0;MQ0F=0;AN=2;DP4=3,0,0,0;MQ=58 GT      0/0     ./.     ./.     ./.
HiC_scaffold_39 29      .       A       .       125.45  .       DP=3;FS=0;MQ0F=0;AN=2;DP4=3,0,0,0;MQ=58 GT      0/0     ./.     ./.     ./.
HiC_scaffold_39 30      .       G       .       114.45  .       DP=3;FS=0;MQ0F=0;AN=2;DP4=3,0,0,0;MQ=58 GT      0/0     ./.     ./.     ./.
HiC_scaffold_39 31      .       T       .       121.45  .       DP=3;FS=0;MQ0F=0;AN=2;DP4=3,0,0,0;MQ=58 GT      0/0     ./.     ./.     ./.
HiC_scaffold_39 32      .       C       .       127.45  .       DP=3;FS=0;MQ0F=0;AN=2;DP4=3,0,0,0;MQ=58 GT      0/0     ./.     ./.     ./.
HiC_scaffold_39 33      .       G       .       162.45  .       DP=4;FS=0;MQ0F=0;AN=4;DP4=4,0,0,0;MQ=53 GT      0/0     0/0     ./.     ./.
HiC_scaffold_39 34      .       T       .       164.45  .       DP=4;FS=0;MQ0F=0;AN=4;DP4=4,0,0,0;MQ=53 GT      0/0     0/0     ./.     ./.
HiC_scaffold_39 35      .       A       T       5.10497 .       DP=4;SGB=-0.576181;RPBZ=-1.34164;MQBZ=-1.41421;BQBZ=-1;SCBZ=0;FS=0;MQ0F=0;AC=1;AN=4;DP4=3,0,1,0;MQ=53   GT:PL:GQ        0/0:0,9,100:6   0/1:37,3,0:3    ./.:0,0,0:0     ./.:0,0,0:0
HiC_scaffold_39 36      .       T       .       162.45  .       DP=5;FS=0;MQ0F=0.2;AN=4;DP4=5,0,0,0;MQ=42       GT      0/0     0/0     ./.     ./.
HiC_scaffold_39 37      .       T       .       164.45  .       DP=5;FS=0;MQ0F=0.2;AN=4;DP4=5,0,0,0;MQ=42       GT      0/0     0/0     ./.     ./.
HiC_scaffold_39 38      .       T       .       165.45  .       DP=5;FS=0;MQ0F=0.2;AN=4;DP4=5,0,0,0;MQ=42       GT      0/0     0/0     ./.     ./.
HiC_scaffold_39 39      .       C       .       165.45  .       DP=5;FS=0;MQ0F=0.2;AN=4;DP4=5,0,0,0;MQ=42       GT      0/0     0/0     ./.     ./.
HiC_scaffold_39 40      .       T       .       166.45  .       DP=5;FS=0;MQ0F=0.2;AN=4;DP4=5,0,0,0;MQ=42       GT      0/0     0/0     ./.     ./.
HiC_scaffold_39 41      .       G       .       167.45  .       DP=5;FS=0;MQ0F=0.2;AN=4;DP4=5,0,0,0;MQ=42       GT      0/0     0/0     ./.     ./.
HiC_scaffold_39 42      .       A       .       166.45  .       DP=5;FS=0;MQ0F=0.2;AN=4;DP4=5,0,0,0;MQ=42       GT      0/0     0/0     ./.     ./.
HiC_scaffold_39 43      .       G       .       165.45  .       DP=5;FS=0;MQ0F=0.2;AN=4;DP4=5,0,0,0;MQ=42       GT      0/0     0/0     ./.     ./.
HiC_scaffold_39 44      .       A       .       157.45  .       DP=5;FS=0;MQ0F=0.2;AN=4;DP4=5,0,0,0;MQ=42       GT      0/0     0/0     ./.     ./.
HiC_scaffold_39 45      .       A       .       185.451 .       DP=6;FS=0;MQ0F=0.166667;AN=4;DP4=6,0,0,0;MQ=45  GT      0/0     0/0     ./.     ./.
HiC_scaffold_39 46      .       A       .       174.451 .       DP=6;FS=0;MQ0F=0.166667;AN=4;DP4=6,0,0,0;MQ=45  GT      0/0     0/0     ./.     ./.
HiC_scaffold_39 47      .       T       .       175.451 .       DP=6;FS=0;MQ0F=0.166667;AN=4;DP4=6,0,0,0;MQ=45  GT      0/0     0/0     ./.     ./.
HiC_scaffold_39 48      .       G       .       184.451 .       DP=6;FS=0;MQ0F=0.166667;AN=4;DP4=6,0,0,0;MQ=45  GT      0/0     0/0     ./.     ./.
HiC_scaffold_39 49      .       G       .       188.451 .       DP=6;FS=0;MQ0F=0.166667;AN=4;DP4=6,0,0,0;MQ=45  GT      0/0     0/0     ./.     ./.
HiC_scaffold_39 50      .       T       .       185.451 .       DP=6;FS=0;MQ0F=0.166667;AN=4;DP4=6,0,0,0;MQ=45  GT      0/0     0/0     ./.     ./.
HiC_scaffold_39 51      .       G       .       188.451 .       DP=6;FS=0;MQ0F=0.166667;AN=4;DP4=6,0,0,0;MQ=45  GT      0/0     0/0     ./.     ./.
HiC_scaffold_39 52      .       C       .       190.451 .       DP=6;FS=0;MQ0F=0.166667;AN=4;DP4=6,0,0,0;MQ=45  GT      0/0     0/0     ./.     ./.
HiC_scaffold_39 53      .       A       .       189.451 .       DP=6;FS=0;MQ0F=0.166667;AN=4;DP4=6,0,0,0;MQ=45  GT      0/0     0/0     ./.     ./.
HiC_scaffold_39 54      .       C       .       188.451 .       DP=6;FS=0;MQ0F=0.166667;AN=4;DP4=6,0,0,0;MQ=45  GT      0/0     0/0     ./.     ./.
HiC_scaffold_39 55      .       G       .       185.451 .       DP=6;FS=0;MQ0F=0.166667;AN=4;DP4=6,0,0,0;MQ=45  GT      0/0     0/0     ./.     ./.
HiC_scaffold_39 56      .       C       .       187.451 .       DP=6;FS=0;MQ0F=0.166667;AN=4;DP4=6,0,0,0;MQ=45  GT      0/0     0/0     ./.     ./.
HiC_scaffold_39 57      .       T       .       180.451 .       DP=6;FS=0;MQ0F=0.166667;AN=4;DP4=6,0,0,0;MQ=45  GT      0/0     0/0     ./.     ./.
HiC_scaffold_39 58      .       C       .       186.451 .       DP=6;FS=0;MQ0F=0.166667;AN=4;DP4=6,0,0,0;MQ=45  GT      0/0     0/0     ./.     ./.
.
.
.

Through bcftools view, there's definitely have some differents among populations in my VCF file.

bcftools view -v snps merged.vcf.gz

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  Zp.Hy926        Zp.RF310d       Oe.HS062501     Oe.HS062507
HiC_scaffold_39 35      .       A       T       5.10497 .       DP=4;SGB=-0.576181;RPBZ=-1.34164;MQBZ=-1.41421;BQBZ=-1;SCBZ=0;FS=0;MQ0F=0;AC=1;AN=4;DP4=3,0,1,0;MQ=53   GT:PL:GQ        0/0:0,9,100:6        0/1:37,3,0:3    ./.:0,0,0:0     ./.:0,0,0:0
HiC_scaffold_39 79      .       C       T       4.30848 .       DP=7;VDB=0.06;SGB=0.592737;RPBZ=1.75862;MQBZ=-1.42044;BQBZ=-1.78885;SCBZ=0;FS=0;MQ0F=0.285714;AC=1;AN=6;DP4=5,0,2,0;MQ=39   GT:PL:GQ 0/0:0,12,125:12 0/1:39,6,0:3    0/0:0,3,4:4     ./.:0,0,0:0
HiC_scaffold_39 97      .       C       A       3.64641 .       DP=9;VDB=0.06;SGB=0.604408;RPBZ=2.01201;MQBZ=-1.20302;BQBZ=0;SCBZ=0;FS=0;MQ0F=0.222222;AC=1;AN=6;DP4=6,0,2,0;MQ=35      GT:PL:GQ     0/0:0,12,123:13 0/1:34,0,3:10   0/0:0,3,4:5     ./.:0,0,0:0
HiC_scaffold_39 241     .       A       T       12.6703 .       DP=16;VDB=0.602287;SGB=1.36429;RPBZ=1.79762;MQBZ=-2.34787;MQSBZ=0.409159;BQBZ=-2.7136;SCBZ=1.29099;FS=0;MQ0F=0.1875;AC=2;AN=4;DP4=8,2,6,0;MQ=22      GT:PL:GQ        0/0:0,20,193:17 1/1:49,10,0:7   ./.:0,0,0:0     ./.:0,0,0:0
HiC_scaffold_39 529     .       A       T       212.476 .       DP=130;VDB=0.535239;SGB=8.40135;RPBZ=-2.61446;MQBZ=0.460176;MQSBZ=-1.82205;BQBZ=-0.960623;SCBZ=1.70192;FS=0;MQ0F=0.123077;AC=2;AN=4;DP4=39,54,21,10;MQ=27    GT:PL:GQ        0/1:153,0,255:127       0/1:93,0,204:91 ./.:0,0,0:0     ./.:0,0,0:0
HiC_scaffold_39 834     .       G       A       94.3667 .       DP=56;VDB=0.376789;SGB=13.3052;RPBZ=-1.5409;MQBZ=-1.97755;MQSBZ=-2.75216;BQBZ=0.569236;SCBZ=0.716599;FS=0;MQ0F=0.535714;AC=2;AN=6;DP4=6,13,2,35;MQ=12        GT:PL:GQ        0/1:126,0,148:127       0/0:0,4,14:3    0/1:4,4,4:3     ./.:0,0,0:0
HiC_scaffold_39 2201    .       C       T       33.4763 .       DP=105;VDB=5.52055e-21;SGB=4.59646;RPBZ=-5.33988;MQBZ=0.229605;MQSBZ=-2.97653;BQBZ=-3.95906;SCBZ=6.933;FS=0;MQ0F=0.552381;AC=4;AN=8;DP4=42,20,42,0;MQ=7      GT:PL:GQ        0/0:0,126,164:123       0/0:0,54,36:36  1/1:36,42,0:33  1/1:47,61,0:46
HiC_scaffold_39 2211    .       A       T       6.92262 .       DP=123;VDB=8.3557e-32;SGB=7.87794;RPBZ=-6.19739;MQBZ=0.40488;MQSBZ=-2.69057;BQBZ=-4.68256;SCBZ=7.38852;FS=0;MQ0F=0.560976;AC=4;AN=8;DP4=44,19,58,1;MQ=6      GT:PL:GQ        0/0:0,123,173:121       0/0:0,66,33:36  1/1:19,39,0:15  1/1:37,54,0:33
HiC_scaffold_39 2214    .       A       T       23.5047 .       DP=125;VDB=1.03366e-30;SGB=7.87794;RPBZ=-5.28781;MQBZ=0.563643;MQSBZ=-2.76914;BQBZ=-3.18128;SCBZ=7.28152;FS=0;MQ0F=0.568;AC=4;AN=8;DP4=44,21,59,0;MQ=6       GT:PL:GQ        0/0:0,129,170:125       0/0:0,66,35:34  1/1:31,78,0:31  1/1:42,96,0:42
HiC_scaffold_39 2221    .       C       A       5.46074 .       DP=127;VDB=3.11627e-13;SGB=11.1054;RPBZ=-2.32015;MQBZ=1.67446;MQSBZ=-2.87698;BQBZ=-3.52741;SCBZ=7.9179;FS=0;MQ0F=0.566929;AC=4;AN=8;DP4=65,22,38,0;MQ=6      GT:PL:GQ        0/0:0,129,169:127       0/0:0,69,38:42  1/1:16,7,0:3    1/1:36,16,0:10
HiC_scaffold_39 2248    .       A       T       5.46066 .       DP=134;VDB=9.2308e-10;SGB=11.1054;RPBZ=1.15098;MQBZ=1.41239;MQSBZ=-2.88167;BQBZ=-4.29282;SCBZ=5.41971;FS=0;MQ0F=0.537313;AC=4;AN=8;DP4=65,24,38,0;MQ=6       GT:PL:GQ        0/0:0,138,193:127       0/0:0,66,41:45  1/1:16,7,0:3    1/1:36,16,0:10
HiC_scaffold_39 2264    .       A       T       7.54839 .       DP=136;VDB=1.44474e-12;SGB=-33.057;RPBZ=2.33816;MQBZ=0.168418;MQSBZ=-2.20264;BQBZ=-4.04072;SCBZ=6.0248;FS=0;MQ0F=0.529412;AC=4;AN=8;DP4=54,30,46,0;MQ=7      GT:PL:GQ        0/0:0,148,253:127       0/0:0,66,36:38  1/1:17,13,0:8   1/1:39,43,0:34
HiC_scaffold_39 2271    .       C       A       25.485  .       DP=134;VDB=2.02455e-25;SGB=35.7041;RPBZ=1.31921;MQBZ=0.23219;MQSBZ=-2.51211;BQBZ=-4.91621;SCBZ=5.51941;FS=0;MQ0F=0.529851;AC=4;AN=8;DP4=42,30,56,0;MQ=7      GT:PL:GQ        0/0:0,144,255:127       0/0:0,72,45:44  1/1:32,75,0:32  1/1:43,93,0:43
HiC_scaffold_39 2274    .       T       A       9.34278 .       DP=134;VDB=9.09822e-15;SGB=13.0369;RPBZ=2.68541;MQBZ=0.199531;MQSBZ=-2.65811;BQBZ=-3.46102;SCBZ=5.97953;FS=0;MQ0F=0.529851;AC=4;AN=8;DP4=54,31,43,0;MQ=7     GT:PL:GQ        0/0:0,144,255:127       0/0:0,72,47:50  1/1:18,11,0:6   1/1:40,41,0:33

Thank you for your time and support.
I look forward to a solution to this problem.

Best,
Jui-Hung

@ala98412 ala98412 added the help wanted Extra attention is needed label Jul 19, 2024
@ksamuk
Copy link
Owner

ksamuk commented Jul 19, 2024

Rolling back to pixy 1.2.7 will solve this issue, a fix is in the works. No calculations differ between those versions.

@hrazif
Copy link

hrazif commented Aug 15, 2024

I wonder if this has been fixed yet. I have the same issue, but with average pi.

@anita-wray
Copy link

anita-wray commented Sep 12, 2024

I am also running into this issue with both dxy and pi.
Any suggestions yet?

@Coral4
Copy link

Coral4 commented Jan 21, 2025

Hi I am also running into this issue with both dxy and pi. Just following up on this thread to ask if there is a solution to this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants