genotypooler

Description

This project implements a simulation of SNP genotype pooling with a simple shifted transversal design. The block size chosen for the pooling design is 4*4, with 8 pools and a design weight of 2. The encoding and decoding part of the pooling procedure can be represented as follows: where {0, 1, 2, -1} are the allelic dosages from the true genotypes values at one SNP of any sample in (a). {0, 1, 2, -1} stand for homozygote reference allele, heterozygote, homozygote alternate allele, missing genotype.

Based on a Marginal Likelihood Maximization method, we implemented a refined version of decoding where the missing true genotypes are converted to posterior genotype probabilities depending on the position of the sample in the block layout lambda and the pooling pattern psi. In the above picture (b), lambda= (0, 2, 1, 0) e.g. allelic dosages of the ambiguous samples after pooling, and psi=((2, 2, 0), (2, 2, 0)) is the pooling pattern e.g. 2 row-pools have genotype 0, 2 have genotype 1, none has genotype 2, idem for the column-pools.

Set up

a Python 3.6 environment with packages listed in requirements.txt, e.g. for a Linux-based OS from the genotypooler folder:

(if venv for Python 3.6 is not installed: apt install libpython3.6-dev python3.6-venv)

/usr/bin/python3.6 -m venv venv3.6

source venv3.6/bin/activate

pip install --upgrade pip

pip install -r requirements.txt

(see official venv documentation)

bcftools installed on the OS. See official page.
tabix

Usage

Some data and scripts are provided as use cases in /examples. In particular, the following files can be found:

adaptive_gls.csv: posterior genotypes probabilities of pooled individuals, computed by Marginal Likelihood Maximization with heterozygotes degeneracy.
ALL.chr20.snps.gt.vcf.gz and its index .csi: a subset of 1000 diallelic SNPs on the chromosome 20 for 2504 unrelated individuals from the 1000 Genomes Prject phase3
TEST.chr20.snps.gt.vcf.gz and its index .csi: a subset of 100 diallelic SNPs on the chromosome 20 for 240 unrelated individuals from the 1000 Genomes Prject phase3
pooling-ex.py: a minimalistic command-line program for simulating SNPs genotypes pooling from VCF files
pooling-imputing-ex.ipynb: a pipeline showing pooling simulation, imputation in pooled data with Beagle and impuatation quality visualization.

Larger data files can be found in /data. They can be used the same way as the ones created in examples /examples after executing pooling-ex.py. However the processing needs to be run in parallel on chunked data:

From /data, run bash ../bin/bcfchunkpara.sh IMP.chr20.snps.gt.vcf.gz ./tmp 1000. You should get 53 chunks (0 to 52) in a tmp folder.
From /runtools run the script parallel_pooling.py with python3 parallel_pooling.py ../data/IMP.chr20.snps.gt.vcf.gz ../data/IMP.chr20.pooled.snps.gl.vcf.gz 4 (if you have 4 cores available on your machine). This should output the pooled file /data/tmp/IMP.chr20.pooled.snps.gl.vcf.gz. You can copy this file where you want and delete the /tmp folder.

References

DNA Sudoku pooling designs
Beagle 4.1 articles for phasing and imputation
Beagle 4.1 documentation and binaries
The 1000 Genomes Project and its VCF phase 3 data release.
Our paper in BMC Bioinformatics: "A joint use of pooling and imputation for genotyping SNPs"

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
bin		bin
data		data
examples		examples
graphtools		graphtools
persotools		persotools
poolSNPs		poolSNPs
runtools		runtools
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
block_layout_ex.pdf		block_layout_ex.pdf
pooling-sim-gtgl.png		pooling-sim-gtgl.png
pooling-simulation.png		pooling-simulation.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

genotypooler

Description

Set up

Usage

References

About

Releases

Packages

Languages

License

camcl/genotypooler

Folders and files

Latest commit

History

Repository files navigation

genotypooler

Description

Set up

Usage

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages