-
Notifications
You must be signed in to change notification settings - Fork 14
Inputs
Here we describe the files required for the QTL mapping Each file contains compulsory fields with compulsory field naming and optional fields (specified by [])
A tab separated text file
Column names:
feature_id
chromosome
start
end
ensembl_gene_id*
feature_strand*
[gene_name]
[superior_feature_id]
*optional for plotting
[] not currently used but taken along.
Example:
feature_id chromosome start end ensembl_gene_id gene_name feature_strand
H3BR00 16 28477974 28503333 ENSG00000261832 CLN3 -
A tab separated text file
The first column contains `feature_id`
The first line contains `sample_id`
Example:
feature_id sample1 sample2 sample3 sample4 sample5
H3BR00 83.8 2198.4 2035.8 2678.2 5266.1
- Binary Plink files.
Using Genotype-Harmonizer a large number of genotyping formats can be converted into binary Plink, the output option is: -O PLINK_BED. See for more information the Genotype-Harmonizer documentation.
Containing sample_id
from the Genotype file and sample_id
from the Phenotype file:
A tab separated text file
The first column contains `sample_id`
The first line contains `covariates`
Example:
sample_id covariate1 covariate2 covariate3
sample1 1 218 0
sample2 1 -32.4 1
sample3 1 0.4 1
sample4 1 28.4 0
A tab separated text file
The first column contains `sample_id`
The first line contains `sample_id`
Example:
sample_id sample1 sample2 sample3 sample4
sample1 1 0.2 0.002 -0.3
sample2 0.2 1.08 0.55 0.1
sample3 0.002 0.55 1 0
sample4 -0.3 0.1 0 1
Using Plink2 this can be easily calculated. Follow the steps below:
Start ideally with none imputed genotypes. (If not available do a stringent QC filter on call rate: "--inputProb 0.6" "-cr 1.0" Using genotype harmonizer to get the most HQ variants.) (NB. I took these steps from Plink but can be also done using for instance genotype harmonizer.)
Remove SNPs with a low MAF frequency, and are out of HWE
/tools/plink2 --bfile {raw_genotype} --maf 0.05 --hwe 1e-6 --make-bed --out {raw_genotype_filtered}"
Prune variants (250 variants, window shift 50, indep at R2 0.2)
plink2 --bfile {raw_genotype_filtered} --indep-pairwise 250 50 0.2 --bad-ld --out {out_pruning_info}"
Make king IBD matrix:
plink2 --bfile {raw_genotype_filtered} --extract {out_pruning_info}.prune.in --make-king square --out {king_ibd_out}
After running this command, the output *.king and *.king.id can be made into the kinship matrix for QTL. Please make sure you multiply the king values by 2, to get in the normal 0-1 space. The kinship needs to have the king.id as row and column information.
A tab separated text file
Column names: genotype_individual_id phenotype_sample_id
Example:
name_genotype_sample1 namepehnotype_sample1
name_genotype_sample2 namepehnotype_sample2.replica1
name_genotype_sample2 namepehnotype_sample2.replica2
name_genotype_sample3 namepehnotype_sample3
To filter down to specific set of variants you can use the 'variant_filter' option while running your analysis. The file that you should give in should have a header with the name: 'snp_id' and one variant / snp per row.
To filter down to specific set of features you can use the 'feature_filter' option while running your analysis. The file that you should give in should have a header with the name: 'feature' and one feature id per row.
To filter down to specific combinations of SNPs and features you can use the 'feature_variant_filter' option while running your analysis. The file that you should give in should be a tab separated that starts with 'snp_id feature' and on the subsequent lines all the snp id (column one) and relevant feature ids (column two) to be tested together should speechified.
To regress out SNP effect to increase power for trans or look to for secondary eQTLs one can give in a file specifying which SNPs to correct for for a specific feature. To do so one must use the 'feature_variant_covariate' flag. The file layout is the same as for the Combined feature variant filter. The file that you should give in should be a tab separated that starts with 'snp_id feature' and on the subsequent lines all the snp id (column one) and relevant feature ids (column two) to be tested together should speechified.