Candidate PD-risk gene SLC25A46 Evaluation using AMP-PD data

by Jonggeol Jeff Kim and Hui Liu

Part of the effort by Hui Liu and Jonggeol Jeffrey Kim to confirm candidate PD-risk gene SLC25A46 based on this paper: https://www.ncbi.nlm.nih.gov/pubmed/32259769

Software used:

PLINK v2.00a2LM/1.9
BCFTools
Annovar
RVTESTS

Steps:

Subset and convert data
Annotation
Allele frequency + count
Burden analysis
Score test
Multiple test correction
Compound Heterozygous Fisher Test

1. Subset and convert data

The data has already been cleaned (e.g. genotype quality, ancestry etc.) and subsetted into a PLINK binary format.

Subsetting log can be read here:

# kernel: bash
cat SLC25A46_WGS_AMP_PD.log

PLINK v1.90b6.16 64-bit (19 Feb 2020)
Options in effect:
  --bfile sample_filtered_geno05_euro
  --chr 5
  --from-bp 110738145
  --make-bed
  --out SLC25A46_WGS_AMP_PD
  --to-bp 110765157
...
243 out of 28195229 variants loaded from .bim file.
2927 people (0 males, 0 females, 2927 ambiguous) loaded from .fam.
...
Total genotyping rate is 0.999947.
243 variants and 2927 people pass filters and QC.
Note: No phenotypes present.
...

Filter data

Filter out participants with sex information missing, as well as variants 0 allele count.

Pull out columns 1, 2, and 5 (FID, IID, SEX) without header.

module load plink/2.3-alpha
plink2 --bfile SLC25A46_WGS_AMP_PD \
       --update-sex data/AMP_PD_sex.txt \
       --remove-nosex \
       --mac 1 \
       --make-bed \
       --out data/SLC25A46.filtered

[+] Loading plink  2.3-alpha 

PLINK v2.00a2.3LM 64-bit Intel (24 Jan 2020)   www.cog-genomics.org/plink/2.0/
(C) 2005-2020 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to data/SLC25A46.filtered.log.
Options in effect:
  --bfile /data/LNG/saraB/AMP_PD/genesfortrainees_2ndpart/SLC25A46_WGS_AMP_PD
  --mac 1
  --make-bed
  --out data/SLC25A46.filtered
  --remove-nosex
  --update-sex data/AMP_PD_sex.txt

...
2927 samples (0 females, 0 males, 2927 ambiguous; 2927 founders) loaded from
/data/LNG/saraB/AMP_PD/genesfortrainees_2ndpart/SLC25A46_WGS_AMP_PD.fam.
243 variants loaded from
/data/LNG/saraB/AMP_PD/genesfortrainees_2ndpart/SLC25A46_WGS_AMP_PD.bim.
Note: No phenotype data present.
--update-sex: 2697 samples updated.
230 samples removed due to sex filter(s).
2697 samples (1188 females, 1509 males; 2697 founders) remaining after main
filters.
Calculating allele frequencies... done.
59 variants removed due to allele frequency threshold(s)
(--maf/--max-maf/--mac/--max-mac).
184 variants remaining after main filters.
...

Create VCFs

module load plink
# use plink 1.9 for this
plink --bfile data/SLC25A46.filtered \
      --recode vcf-fid bgz \
      --out data/vcf/SLC25A46.filtered.fid

[-] Unloading plink  2.3-alpha 
[+] Loading plink  1.9.0-beta4.4  on cn0927 

The following have been reloaded with a version change:
  1) plink/2.3-alpha => plink/1.9.0-beta4.4

PLINK v1.90b4.4 64-bit (21 May 2017)           www.cog-genomics.org/plink/1.9/
(C) 2005-2017 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to data/vcf/SLC25A46.filtered.fid.log.
Options in effect:
  --bfile data/SLC25A46.filtered
  --out data/vcf/SLC25A46.filtered.fid
  --recode vcf-fid bgz

...
184 variants loaded from .bim file.
2697 people (1509 males, 1188 females) loaded from .fam.
...
Total genotyping rate is 0.99994.
184 variants and 2697 people pass filters and QC.
Note: No phenotypes present.
...

cd data/vcf

module load samtools
# tabulate
tabix -f -p vcf SLC25A46.filtered.fid.vcf.gz

cd ../..

[+] Loading samtools 1.10  ...

2. Determine Allele Frequency + Count

# bring back plink2
module load plink/2.3-alpha

plink2 --bfile data/SLC25A46.filtered \
       --freq \
       --out result/freq/SLC25A46.filtered.af
plink2 --bfile data/SLC25A46.filtered \
       --geno-counts \
       --out result/freq/SLC25A46.filtered.ac

[+] Loading plink  2.3-alpha 
PLINK v2.00a2.3LM 64-bit Intel (24 Jan 2020)   www.cog-genomics.org/plink/2.0/
(C) 2005-2020 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to result/freq/SLC25A46.filtered.af.log.
Options in effect:
  --bfile data/SLC25A46.filtered
  --freq
  --out result/freq/SLC25A46.filtered.af

...
2697 samples (1188 females, 1509 males; 2697 founders) loaded from
data/SLC25A46.filtered.fam.
184 variants loaded from data/SLC25A46.filtered.bim.
Note: No phenotype data present.
Calculating allele frequencies... done.
--freq: Allele frequencies (founders only) written to222232324252526262727282829293030313232333334343535363637383839394040414142424344444545464647474848495050515152525353545455555657575858595960606161626363646465656666676768696970707171727273737475757676777778787979808081828283838484858586868788888989909091919292939494959596969797989899%
result/freq/SLC25A46.filtered.af.afreq .
...
PLINK v2.00a2.3LM 64-bit Intel (24 Jan 2020)   www.cog-genomics.org/plink/2.0/
(C) 2005-2020 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to result/freq/SLC25A46.filtered.ac.log.
Options in effect:
  --bfile data/SLC25A46.filtered
  --geno-counts
  --out result/freq/SLC25A46.filtered.ac

...
2697 samples (1188 females, 1509 males; 2697 founders) loaded from
data/SLC25A46.filtered.fam.
184 variants loaded from data/SLC25A46.filtered.bim.
Note: No phenotype data present.
Calculating allele frequencies... done.
--geno-counts: Genotype counts written to1717181919202021212222232324252526262727282829293030313232333334343535363637383839394040414142424344444545464647474848495050515152525353545455555657575858595960606161626363646465656666676768696970707171727273737475757676777778787979808081828283838484858586868788888989909091919292939494959596969797989899%
result/freq/SLC25A46.filtered.ac.gcount .
...

3. Annotation

table_annovar.pl data/vcf/SLC25A46.filtered.fid.vcf.gz \
                 $ANNOVAR_DATA/hg38 --thread 20 -buildver hg38 \
                 -out result/anno/SLC25A46.filtered.fid.annovar \
                 -arg '-splicing 15',,, \
                 -remove -protocol refGene,ljb26_all,gnomad211_genome,clinvar_20190305 \
                 -operation g,f,f,f -nastring . -vcfinput -polish

Trim results

# remove unnecessary VCF file
rm result/anno/SLC25A46.filtered.fid.annovar.hg38_multianno.vcf

# truncate unnecessay genotype information
cd result/anno

head -1 SLC25A46.filtered.fid.annovar.hg38_multianno.txt > header.txt
colct="$(wc -w header.txt| cut -f1 -d' ')"
cut -f1-$colct SLC25A46.filtered.fid.annovar.hg38_multianno.txt > SLC25A46.filtered.fid.annovar.hg38_multianno.trimmed.txt
rm header.txt

cd ../..

4. Gene-Level Burden Analyses

# makes rvtests more manageable
grep SLC25A46 /data/LNG/KIM/burden_resources/refFlat_hg38.txt > data/SLC25A46.hg38.genefile

module load rvtests

rvtest --inVcf data/vcf/SLC25A46.filtered.fid.vcf.gz \
       --pheno data/covariateInfo_AMPPD.txt \
       --covar data/covariateInfo_AMPPD.txt \
       --covar-name SEX,AGE,EDUCATION,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,STUDY_BioFIND,STUDY_PDBP,STUDY_PPMI \
       --kernel skat,skato \
       --burden cmc,zeggini,mb,fp,cmcWald \
       --geneFile data/SLC25A46.hg38.genefile \
       --freqUpper 0.03 \
       --out result/burden/SLC25A46.filtered.burden.maf03


rvtest --inVcf data/vcf/SLC25A46.filtered.fid.vcf.gz \
       --pheno data/covariateInfo_AMPPD.txt \
       --covar data/covariateInfo_AMPPD.txt \
       --covar-name SEX,AGE,EDUCATION,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,STUDY_BioFIND,STUDY_PDBP,STUDY_PPMI \
       --kernel skat,skato \
       --burden cmc,zeggini,mb,fp,cmcWald \
       --geneFile data/SLC25A46.hg38.genefile \
       --freqUpper 0.01 \
       --out result/burden/SLC25A46.filtered.burden.maf01

[+] Loading rvtests  2.1.0  on cn0929 
Thank you for using rvtests (version: 20190205, git: c86e589efef15382603300dc7f4c3394c82d69b8)
  For documentations, refer to http://zhanxw.github.io/rvtests/
  For questions and comments, plase send to Xiaowei Zhan <zhanxw@umich.edu>
  For bugs and feature requests, please submit at: https://github.com/zhanxw/rvtests/issues
...
[INFO]	Program version: 20190205
[INFO]	Analysis started at: Thu Sep  3 18:55:49 2020
[INFO]	Loaded [ 2697 ] samples from genotype files
[INFO]	Loaded [ 2697 ] sample phenotypes
[INFO]	Begin to read covariate file
[INFO]	Loaded 1509 male, 1188 female and 0 sex-unknown samples from data/covariateInfo_AMPPD.txt
[INFO]	Loaded 1647 cases, 1050 controls, and 0 missing phenotypes
[WARN]	-- Enabling binary phenotype mode -- 
[INFO]	Analysis begins with [ 2697 ] samples...
[INFO]	MadsonBrowning test significance will be evaluated using 10000 permutations at alpha = 0.05
[INFO]	SKAT test significance will be evaluated using 10000 permutations at alpha = 0.05 weight = Beta[beta1 = 1.00, beta2 = 25.00]
[INFO]	SKAT-O test significance will be evaluated using weight = Beta[beta1 = 1.00, beta2 = 25.00]
[INFO]	Loaded [ 1 ] genes.
[INFO]	Impute missing genotype to mean (by default)
[INFO]	Set upper minor allele frequency limit to 0.03
[INFO]	Analysis started
[INFO]	Analyzed [ 132 ] variants from [ 1 ] genes/regions
[INFO]	Analysis ends at: Thu Sep  3 18:55:53 2020
[INFO]	Analysis took 4 seconds
RVTESTS finished successfully
...
[INFO]	Program version: 20190205
[INFO]	Analysis started at: Thu Sep  3 18:55:53 2020
[INFO]	Loaded [ 2697 ] samples from genotype files
[INFO]	Loaded [ 2697 ] sample phenotypes
[INFO]	Begin to read covariate file
[INFO]	Loaded 1509 male, 1188 female and 0 sex-unknown samples from data/covariateInfo_AMPPD.txt
[INFO]	Loaded 1647 cases, 1050 controls, and 0 missing phenotypes
[WARN]	-- Enabling binary phenotype mode -- 
[INFO]	Analysis begins with [ 2697 ] samples...
[INFO]	MadsonBrowning test significance will be evaluated using 10000 permutations at alpha = 0.05
[INFO]	SKAT test significance will be evaluated using 10000 permutations at alpha = 0.05 weight = Beta[beta1 = 1.00, beta2 = 25.00]
[INFO]	SKAT-O test significance will be evaluated using weight = Beta[beta1 = 1.00, beta2 = 25.00]
[INFO]	Loaded [ 1 ] genes.
[INFO]	Impute missing genotype to mean (by default)
[INFO]	Set upper minor allele frequency limit to 0.01
[INFO]	Analysis started
[INFO]	Analyzed [ 115 ] variants from [ 1 ] genes/regions
[INFO]	Analysis ends at: Thu Sep  3 18:55:56 2020
[INFO]	Analysis took 3 seconds
RVTESTS finished successfully

5. Score Test

rvtest --inVcf data/vcf/SLC25A46.filtered.fid.vcf.gz \
       --pheno data/covariateInfo_AMPPD.txt \
       --covar data/covariateInfo_AMPPD.txt \
       --covar-name SEX,AGE,EDUCATION,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,STUDY_BioFIND,STUDY_PDBP,STUDY_PPMI \
       --single score \
       --geneFile data/SLC25A46.hg38.genefile \
       --out result/score/SLC25A46.filtered.score

Thank you for using rvtests (version: 20190205, git: c86e589efef15382603300dc7f4c3394c82d69b8)
  For documentations, refer to http://zhanxw.github.io/rvtests/
  For questions and comments, plase send to Xiaowei Zhan <zhanxw@umich.edu>
  For bugs and feature requests, please submit at: https://github.com/zhanxw/rvtests/issues
...
[INFO]	Program version: 20190205
[INFO]	Analysis started at: Fri Sep  4 17:27:10 2020
[INFO]	Loaded [ 2697 ] samples from genotype files
[INFO]	Loaded [ 2697 ] sample phenotypes
[INFO]	Begin to read covariate file
[INFO]	Loaded 1509 male, 1188 female and 0 sex-unknown samples from data/covariateInfo_AMPPD.txt
[INFO]	Loaded 1647 cases, 1050 controls, and 0 missing phenotypes
[WARN]	-- Enabling binary phenotype mode -- 
[INFO]	Analysis begins with [ 2697 ] samples...
[INFO]	Loaded [ 1 ] genes.
[INFO]	Impute missing genotype to mean (by default)
[INFO]	Analysis started
[INFO]	Analyzed [ 718 ] variants from [ 1 ] genes/regions
[INFO]	Analysis ends at: Fri Sep  4 17:27:14 2020
[INFO]	Analysis took 4 seconds
RVTESTS finished successfully

6. Add multiple-test corrections to score results

We will use False Discovery Rate (FDR)

# kernel: python 3.7
import os
import numpy as np
import pandas as pd
import statsmodels.api as sm
import statsmodels.stats as sm
from patsy import dmatrices

os.chdir("/data/LNG/KIM/burdenAnalysis/SLC25A46-replication/")

score = pd.read_csv("result/score/SLC25A46.filtered.score.SingleScore.assoc", sep = "\t")

score

	Gene	CHROM	POS	REF	ALT	N_INFORMATIVE	AF	U	V	STAT	DIRECTION	EFFECT	SE	PVALUE
0	SLC25A46	5	110738249	T	C	2697	0.000927	1.069550	0.908378	1.259320	+	1.177430	1.049220	0.261780
1	SLC25A46	5	110738407	C	T	2697	0.000371	-0.045776	0.388462	0.005394	-	-0.117838	1.604450	0.941452
2	SLC25A46	5	110738538	A	C	2697	0.002039	0.600238	2.072550	0.173837	+	0.289614	0.694621	0.676724
3	SLC25A46	5	110738643	G	A	2697	0.000185	-0.109522	0.097021	0.123633	-	-1.128850	3.210460	0.725127
4	SLC25A46	5	110738696	T	C	2697	0.021505	-0.053915	20.393400	0.000143	-	-0.002644	0.221440	0.990474
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
713	SLC25A46	5	110764313	T	C	2697	0.001112	-1.321120	1.396020	1.250240	-	-0.946349	0.846359	0.263506
714	SLC25A46	5	110764363	C	T	2697	0.117723	-3.186630	119.149000	0.085226	-	-0.026745	0.091612	0.770336
715	SLC25A46	5	110764618	G	A	2697	0.099926	-12.843500	81.840200	2.015590	-	-0.156934	0.110539	0.155691
716	SLC25A46	5	110764674	G	GA	2697	0.000927	0.547030	0.464448	0.644296	+	1.177810	1.467340	0.422159
717	SLC25A46	5	110764969	G	T	2697	0.243789	-14.176400	139.849000	1.437040	-	-0.101369	0.084561	0.230618

718 rows × 14 columns

score = score.drop_duplicates()

score

	Gene	CHROM	POS	REF	ALT	N_INFORMATIVE	AF	U	V	STAT	DIRECTION	EFFECT	SE	PVALUE
0	SLC25A46	5	110738249	T	C	2697	0.000927	1.069550	0.908378	1.259320	+	1.177430	1.049220	0.261780
1	SLC25A46	5	110738407	C	T	2697	0.000371	-0.045776	0.388462	0.005394	-	-0.117838	1.604450	0.941452
2	SLC25A46	5	110738538	A	C	2697	0.002039	0.600238	2.072550	0.173837	+	0.289614	0.694621	0.676724
3	SLC25A46	5	110738643	G	A	2697	0.000185	-0.109522	0.097021	0.123633	-	-1.128850	3.210460	0.725127
4	SLC25A46	5	110738696	T	C	2697	0.021505	-0.053915	20.393400	0.000143	-	-0.002644	0.221440	0.990474
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
179	SLC25A46	5	110764313	T	C	2697	0.001112	-1.321120	1.396020	1.250240	-	-0.946349	0.846359	0.263506
180	SLC25A46	5	110764363	C	T	2697	0.117723	-3.186630	119.149000	0.085226	-	-0.026745	0.091612	0.770336
181	SLC25A46	5	110764618	G	A	2697	0.099926	-12.843500	81.840200	2.015590	-	-0.156934	0.110539	0.155691
182	SLC25A46	5	110764674	G	GA	2697	0.000927	0.547030	0.464448	0.644296	+	1.177810	1.467340	0.422159
183	SLC25A46	5	110764969	G	T	2697	0.243789	-14.176400	139.849000	1.437040	-	-0.101369	0.084561	0.230618

184 rows × 14 columns

score['FDR-Corrected'] = sm.multitest.multipletests(score['PVALUE'])[1]
# alpha = 0.05

/usr/local/Anaconda/envs/py3.7/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.

score['FDR-Corrected']

0      1.0
1      1.0
2      1.0
3      1.0
4      1.0
      ... 
179    1.0
180    1.0
181    1.0
182    1.0
183    1.0
Name: FDR-Corrected, Length: 184, dtype: float64

score.head()

	Gene	CHROM	POS	REF	ALT	N_INFORMATIVE	AF	U	V	STAT	DIRECTION	EFFECT	SE	PVALUE	FDR-Corrected
0	SLC25A46	5	110738249	T	C	2697	0.000927	1.069550	0.908378	1.259320	+	1.177430	1.049220	0.261780	1.0
1	SLC25A46	5	110738407	C	T	2697	0.000371	-0.045776	0.388462	0.005394	-	-0.117838	1.604450	0.941452	1.0
2	SLC25A46	5	110738538	A	C	2697	0.002039	0.600238	2.072550	0.173837	+	0.289614	0.694621	0.676724	1.0
3	SLC25A46	5	110738643	G	A	2697	0.000185	-0.109522	0.097021	0.123633	-	-1.128850	3.210460	0.725127	1.0
4	SLC25A46	5	110738696	T	C	2697	0.021505	-0.053915	20.393400	0.000143	-	-0.002644	0.221440	0.990474	1.0

score.to_csv("result/score/SLC25A46.filtered.score.SingleScore.FDR.dup_cleaned.assoc", sep="\t", index=False)

Let's also filter out the non-synonymous variants of interest. They are:

5:110739267
5:110739354
5:110746300
5:110756712
5:110761217
5:110761271
5:110761292
5:110761301
5:110761662

anno = pd.read_csv("result/anno/SLC25A46.filtered.fid.annovar.hg38_multianno.trimmed.txt", sep = "\t")
anno.head()

	Chr	Start	End	Ref	Alt	Func.refGene	Gene.refGene	GeneDetail.refGene	ExonicFunc.refGene	AAChange.refGene	...	non_topmed_AF_popmax	non_neuro_AF_popmax	non_cancer_AF_popmax	controls_AF_popmax	CLNALLELEID	CLNDN	CLNDISDB	CLNREVSTAT	CLNSIG	Otherinfo
0	5	110738249	110738249	T	C	splicing	SLC25A46	NM_001303250:exon1:c.10+2T>C	.	.	...	0.0006	0.0005	.	0.0008	.	.	.	.	.	0.000927
1	5	110738407	110738407	C	T	intronic	SLC25A46	.	.	.	...	0.0267	0.0269	.	0.0292	.	.	.	.	.	0.000371
2	5	110738538	110738538	A	C	intronic	SLC25A46	.	.	.	...	0.0037	0.0036	.	0.0081	.	.	.	.	.	0.002039
3	5	110738643	110738643	G	A	intronic	SLC25A46	.	.	.	...	.	.	.	.	.	.	.	.	.	0.000185
4	5	110738696	110738696	T	C	intronic	SLC25A46	.	.	.	...	0.0302	0.0264	.	0.0285	.	.	.	.	.	0.021510

5 rows × 58 columns

is_nonsyn = anno['ExonicFunc.refGene']=="nonsynonymous SNV"
anno_nonsyn = anno[is_nonsyn]
anno_nonsyn

	Chr	Start	End	Ref	Alt	Func.refGene	Gene.refGene	GeneDetail.refGene	ExonicFunc.refGene	AAChange.refGene	...	non_topmed_AF_popmax	non_neuro_AF_popmax	non_cancer_AF_popmax	controls_AF_popmax	CLNALLELEID	CLNDN	CLNDISDB	CLNREVSTAT	CLNSIG	Otherinfo
10	5	110739267	110739267	C	A	exonic	SLC25A46	.	nonsynonymous SNV	SLC25A46:NM_001303249:exon1:c.C148A:p.P50T,SLC...	...	0.0012	0.0018	.	0.0004	520610	NEUROPATHY,_HEREDITARY_MOTOR_AND_SENSORY,_TYPE...	MedGen:C4225302,OMIM:616505	criteria_provided,_single_submitter	Uncertain_significance	0.000742
11	5	110739354	110739354	G	A	exonic	SLC25A46	.	nonsynonymous SNV	SLC25A46:NM_001303249:exon1:c.G235A:p.E79K,SLC...	...	0.0014	0.0015	.	0.0011	520315	NEUROPATHY,_HEREDITARY_MOTOR_AND_SENSORY,_TYPE...	MedGen:C4225302,OMIM:616505	criteria_provided,_single_submitter	Likely_benign	0.002225
61	5	110746300	110746300	C	A	exonic	SLC25A46	.	nonsynonymous SNV	SLC25A46:NM_001303249:exon4:c.C416A:p.T139N,SL...	...	0.0011	0.0010	.	0.0011	453904	NEUROPATHY,_HEREDITARY_MOTOR_AND_SENSORY,_TYPE...	MedGen:C4225302,OMIM:616505	criteria_provided,_single_submitter	Likely_benign	0.001112
129	5	110756712	110756712	G	A	exonic	SLC25A46	.	nonsynonymous SNV	SLC25A46:NM_001303249:exon7:c.G631A:p.V211M,SL...	...	0.0035	0.0030	.	0.0035	453906	NEUROPATHY,_HEREDITARY_MOTOR_AND_SENSORY,_TYPE...	MedGen:C4225302,OMIM:616505	criteria_provided,_single_submitter	Likely_benign	0.003337
156	5	110761217	110761217	G	A	exonic	SLC25A46	.	nonsynonymous SNV	SLC25A46:NM_001303250:exon8:c.G419A:p.R140Q,SL...	...	.	.	.	.	.	.	.	.	.	0.000185
158	5	110761271	110761271	G	A	exonic	SLC25A46	.	nonsynonymous SNV	SLC25A46:NM_001303250:exon8:c.G473A:p.G158D,SL...	...	0.0004	0.0002	.	0.0004	359143	NEUROPATHY,_HEREDITARY_MOTOR_AND_SENSORY,_TYPE...	MedGen:C4225302,OMIM:616505	no_assertion_criteria_provided	Pathogenic	0.000185
159	5	110761292	110761292	A	G	exonic	SLC25A46	.	nonsynonymous SNV	SLC25A46:NM_001303250:exon8:c.A494G:p.K165R,SL...	...	0.0041	0.0037	.	0.0053	454455	NEUROPATHY,_HEREDITARY_MOTOR_AND_SENSORY,_TYPE...	MedGen:C4225302,OMIM:616505	criteria_provided,_single_submitter	Likely_benign	0.002225
160	5	110761301	110761301	T	G	exonic	SLC25A46	.	nonsynonymous SNV	SLC25A46:NM_001303250:exon8:c.T503G:p.L168R,SL...	...	.	7.348e-05	.	.	454461	NEUROPATHY,_HEREDITARY_MOTOR_AND_SENSORY,_TYPE...	MedGen:C4225302,OMIM:616505	criteria_provided,_single_submitter	Uncertain_significance	0.000185
161	5	110761662	110761662	G	T	exonic	SLC25A46	.	nonsynonymous SNV	SLC25A46:NM_001303249:exon8:c.G894T:p.E298D,SL...	...	0.1177	0.1124	.	0.1157	454465	NEUROPATHY,_HEREDITARY_MOTOR_AND_SENSORY,_TYPE...	MedGen:C4225302,OMIM:616505	criteria_provided,_single_submitter	Benign	0.000371

9 rows × 58 columns

nonsym_POS = anno_nonsyn.rename(columns={"Start": "POS"})[['POS']]
score_nonsyn = pd.merge(nonsym_POS, score, on="POS")
score_nonsyn

	POS	Gene	CHROM	REF	ALT	N_INFORMATIVE	AF	U	V	STAT	DIRECTION	EFFECT	SE	PVALUE	FDR-Corrected
0	110739267	SLC25A46	5	C	A	2697	0.000742	0.823836	0.599113	1.132850	+	1.375090	1.291950	0.287168	1.0
1	110739354	SLC25A46	5	G	A	2697	0.002225	-0.172238	1.856260	0.015982	-	-0.092788	0.733974	0.899401	1.0
2	110746300	SLC25A46	5	C	A	2697	0.001112	0.048595	0.935050	0.002526	+	0.051971	1.034150	0.959920	1.0
3	110756712	SLC25A46	5	G	A	2697	0.003337	2.870400	3.130230	2.632130	+	0.916992	0.565213	0.104721	1.0
4	110761217	SLC25A46	5	G	A	2697	0.000185	-0.048355	0.045875	0.050969	-	-1.054060	4.668870	0.821385	1.0
5	110761271	SLC25A46	5	G	A	2697	0.000185	0.259504	0.191297	0.352031	+	1.356550	2.286360	0.552966	1.0
6	110761292	SLC25A46	5	A	G	2697	0.002225	-2.064850	2.313330	1.843060	-	-0.892586	0.657478	0.174593	1.0
7	110761301	SLC25A46	5	T	G	2697	0.000185	-0.141477	0.120936	0.165506	-	-1.169850	2.875560	0.684137	1.0
8	110761662	SLC25A46	5	G	T	2697	0.000371	-0.008878	0.130762	0.000603	-	-0.067895	2.765410	0.980413	1.0

# reorder column
score_nonsyn_cols = list(score_nonsyn.columns)
score_nonsyn_new_cols = ["Gene", "CHROM", "POS"] + score_nonsyn_cols[3:]
score_nonsyn[score_nonsyn_new_cols]

	Gene	CHROM	POS	REF	ALT	N_INFORMATIVE	AF	U	V	STAT	DIRECTION	EFFECT	SE	PVALUE	FDR-Corrected
0	SLC25A46	5	110739267	C	A	2697	0.000742	0.823836	0.599113	1.132850	+	1.375090	1.291950	0.287168	1.0
1	SLC25A46	5	110739354	G	A	2697	0.002225	-0.172238	1.856260	0.015982	-	-0.092788	0.733974	0.899401	1.0
2	SLC25A46	5	110746300	C	A	2697	0.001112	0.048595	0.935050	0.002526	+	0.051971	1.034150	0.959920	1.0
3	SLC25A46	5	110756712	G	A	2697	0.003337	2.870400	3.130230	2.632130	+	0.916992	0.565213	0.104721	1.0
4	SLC25A46	5	110761217	G	A	2697	0.000185	-0.048355	0.045875	0.050969	-	-1.054060	4.668870	0.821385	1.0
5	SLC25A46	5	110761271	G	A	2697	0.000185	0.259504	0.191297	0.352031	+	1.356550	2.286360	0.552966	1.0
6	SLC25A46	5	110761292	A	G	2697	0.002225	-2.064850	2.313330	1.843060	-	-0.892586	0.657478	0.174593	1.0
7	SLC25A46	5	110761301	T	G	2697	0.000185	-0.141477	0.120936	0.165506	-	-1.169850	2.875560	0.684137	1.0
8	SLC25A46	5	110761662	G	T	2697	0.000371	-0.008878	0.130762	0.000603	-	-0.067895	2.765410	0.980413	1.0

score_nonsyn[score_nonsyn_new_cols].to_csv("result/score/SLC25A46.filtered.score.SingleScore.FDR.dup_cleaned.nonsyn.assoc", sep="\t", index=False)

7. Compound Heterozygous Fisher Test

1 control and 1 case was found with p.E79K/p.V211M compound heterozygosity (no homozygous found in either groups).

Generate 2x2 matrix

# kernel: R 3.6
data <- matrix(c(1,1049,1,1046), ncol=2)
data

     [,1] [,2]
[1,]    1    1
[2,] 1049 1046

# run fisher test
fisher.test(data)

	Fisher's Exact Test for Count Data

data:  data
p-value = 1
alternative hypothesis: true odds ratio is not equal to 1
95 percent confidence interval:
  0.01269833 78.30123556
sample estimates:
odds ratio 
 0.9971414

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SLC25A46.md

SLC25A46.md

Candidate PD-risk gene SLC25A46 Evaluation using AMP-PD data

1. Subset and convert data

Filter data

Pull out columns 1, 2, and 5 (FID, IID, SEX) without header.

Create VCFs

2. Determine Allele Frequency + Count

3. Annotation

Trim results

4. Gene-Level Burden Analyses

5. Score Test

6. Add multiple-test corrections to score results

7. Compound Heterozygous Fisher Test

Files

SLC25A46.md

Latest commit

History

SLC25A46.md

File metadata and controls

Candidate PD-risk gene SLC25A46 Evaluation using AMP-PD data

1. Subset and convert data

Filter data

Pull out columns 1, 2, and 5 (FID, IID, SEX) without header.

Create VCFs

2. Determine Allele Frequency + Count

3. Annotation

Trim results

4. Gene-Level Burden Analyses

5. Score Test

6. Add multiple-test corrections to score results

7. Compound Heterozygous Fisher Test