GitHub - BaderLab/PNC_PathwayAnalysis: Pathway analysis pipeline for PNC data (GWAS + GSEA)

Instructions on how to run the Pathway Analysis pipeline for PNC data
Nov 9, 2018
By Shirley Hui

A) PERMUTE PHENOTYPES

Permute phenotypes

File: ROOT_DIR/permutePhenos.R

This file permutes phenotype status in the phenoFile (see code).
Subjects with status -9 are not included in the permutation

Run:

Rscript perumtePhenos.R

Note: Set variables inside script before running

Output:

Each permutation is written out to a file with the same format as phenoFile
Note: pemutations are written to files to ensure reproducibility

B) PERFORM GWAS (genome-wide associations)

Make gwas batch and submit scripts

File: ROOT_DIR/make_plink_assoc_bin_jobfiles.sh

This script make a txt file that contains the call to run gwas (logistic regression + PC covariates) to compute snp-phenotype associations using plink.

Run:

./make_plink_assoc_bin_jobfiles.sh 0 <num_perms>

Example: ./make_plink_assoc_bin_jobfiles.sh volt_svt 0 100 1-5

Output:

a file called ROOT_DIR/jobfiles/plink_assoc__0-<num_perms>-covar-.txt that contains the call to run one permutation of gwas (one call per line)
a file called ROOT_DIR/run_gwas_bin__0-<num_perms>-covar-.sh that calls the txt file above via parallel

Compute gene-phenotype associations

Run ROOT_DIR/run_gwas_bin__0-<num_perms>-covar-.sh script from previous
The script runs GWAS (runGWAS.R) 0 to <num_perms> times for pheno of interest using logistic regression and 0-n PC covariates supplied by the user

Output:

pvalues for each SNP outputted to a file for each permuation (0 to <num_perms>)

C) PERFORM PATHWAY GSEA

The following assumes the pipeline is running on a high

Make gsea batch and submit scripts

File: ROOT_DIR/code/src/MakeJobs.java

This script will make a bunch of .sh files that contain the 8 parallel jobs that will run when it is called.
If num perm = 100, 13 .sh files will be made. If num perm= 1000, 125 .sh files will be made. Note code doesn't really support num perms not equal to 100 or 1000.
If you make any code changes, to compile script, run the ./compile.sh at the prompt

Run:

module load java java -cp ../out MakeJobs <num_perms> bin

Example: java -cp ../out MakeJobs volt_svt 100 bin 1-5

Output:

a bunch of job files located in ROOT_DIR/job_scripts/gsea/run-gsea---bin-covars.sh
a master submit script located in ROOT_DIR/submit-gsea--bin-covars.sh

Run gsea

Run submit script from previous
You will get a bunch of warnings (which is ok) about how the submitted job time is the minimum of 15 mins.
This will create several directories (one for each permutation). In each directory there will be two output files (*gsea.txt - gsea output, *genes.txt - detailed output)

Move gsea output files into one directory

File: ROOT_DIR/moveFiles.sh

This will move all output files into ROOT_DIR/gsea/bin/gsea-_covars where:
- gsea.txt files get moved to obs directory,for perm0 or perm directory, for all other permutations
- genes.txt files get moved to genes directory for all perms

Run:

moveFiles.sh bin

Example: moveFiles.sh volt_svt bin 1-5 100

Delete empty directories and run txt and log files:

rm -r Gsea-bin-covars* rm run_gsea_-<batch_num>-bin-covars*

Normalize ES scores

File: ROOT_DIR/mygseaCalcNES.py

This script will normalize the enrichment scores from step 2 above.

Run:

module load python/2.7.14-anaconda5.1.0 python mygseaCalcNES.py bin

Example: python mygseaCalcNES.py volt_svt bin 1-5 100

Output:

This will create a bunch of nes*.txt in the ROOT_DIR/nes/ directory

Compute FDR score

File: ROOT_DIR/compareFDR.R

This script will compute FDR scores using NES from step 4)

Run:

module load r/3.4.3-anaconda5.1.0 Rscript computeFDR.R bin

Example: Rscript computeFDR.R volt_svt bin 1-5 8

Output: - This will create a file fdr--bin-covars-.txt in the ROOT_DIR/fdr directory

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
code/src		code/src
.gitignore		.gitignore
Human_GOBP_AllPathways_no_GO_iea_May_01_2018_symbol.gmt		Human_GOBP_AllPathways_no_GO_iea_May_01_2018_symbol.gmt
README.md		README.md
computeFDR.R		computeFDR.R
make_plink_assoc_bin_jobfiles.sh		make_plink_assoc_bin_jobfiles.sh
moveFiles.sh		moveFiles.sh
mygseaCalcESCentrality-perm-pvalues-out.py		mygseaCalcESCentrality-perm-pvalues-out.py
mygseaCalcNES.py		mygseaCalcNES.py
permutePhenos.R		permutePhenos.R
pnc_snp_info_hg19.txt.out		pnc_snp_info_hg19.txt.out
reader.py		reader.py
reader.pyc		reader.pyc
runGWAS.R		runGWAS.R
test.sh		test.sh
util.py		util.py
util.pyc		util.pyc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

BaderLab/PNC_PathwayAnalysis

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages