Skip to content

Commit

Permalink
add pipeline for demographic inference and simulation
Browse files Browse the repository at this point in the history
  • Loading branch information
quanc1989 committed May 13, 2021
1 parent 4a047ca commit efb7f03
Show file tree
Hide file tree
Showing 9 changed files with 1,346 additions and 2 deletions.
19 changes: 17 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ This repository includes data and scripts to analyze structural variations of 25
- [SVTK](https://github.com/talkowski-lab/svtk)
- [ANNOTSV](https://github.com/lgmgeo/AnnotSV)
- [Paragraph](https://github.com/Illumina/paragraph)
- [vcftools](http://vcftools.sourceforge.net/man_latest.html)
- [bcftools](http://samtools.github.io/bcftools)
- python3
- numpy

Expand All @@ -52,11 +54,24 @@ In the bash file ```pipeline.sv-calling.sh```, we use sample data to demonstrate


-------
## Sample and Variant quality control

### Script: pipeline.sv-calling.sh
## Pipeline for Demographic inference and simulation

### Script: pipeline.demographic_inference.sh

### Requirements
- [easySFS](https://github.com/isaacovercast/easySFS)
- [dadi](https://dadi.readthedocs.io/en/latest/)
- [msprime](https://github.com/tskit-dev/tutorials)


### Summary
In the bash file ```pipeline.demographic_inference.sh```, we use sample data to demonstrate the complete process of detecting and annotating structural variations based on nanopore sequencing technology.

1. Firstly, long reads were mapped to GRCh37 human reference from NCBI without alternate sequences. Mapping was performed with NGMLR with ONT default parameters.

2. Then SV calling was performed on each sample using Sniffles, NanoSV, and SVIM. These tools have been reported to be compatible with NGMLR and show better accuracy and sensitivity than others. Five minimum supporting reads with at least 50 bp length was required. The insertion sequence and read ID was required for each method, and the rest are all default parameters.


-------
## Visualization for chracteristics of SVs
Expand Down
272 changes: 272 additions & 0 deletions example/samples.YRIandTIBandHAN.clust
Original file line number Diff line number Diff line change
@@ -0,0 +1,272 @@
NA18508 YRI
NA18510 YRI
NA18522 YRI
NA18488 YRI
NA18868 YRI
NA18870 YRI
NA18499 YRI
NA18856 YRI
NA18502 YRI
NA18507 YRI
NA18519 YRI
NA18933 YRI
NA18907 YRI
NA18867 YRI
NA18874 YRI
NA19102 YRI
NA19107 YRI
NA19114 YRI
NA19119 YRI
NA19121 YRI
NA18879 YRI
NA18881 YRI
NA19099 YRI
NA19239 YRI
NA19138 YRI
NA19118 YRI
NA19137 YRI
NA19210 YRI
NA19093 YRI
NA19098 YRI
NA19190 YRI
NA19113 YRI
NA19152 YRI
NA19171 YRI
NA19222 YRI
NA19238 YRI
NA19257 YRI
NA19207 YRI
NA19214 YRI
NA19144 YRI
NA19149 YRI
NA19175 YRI
NA18505 YRI
NA18517 YRI
NA18853 YRI
NA18858 YRI
NA18916 YRI
NA18923 YRI
NA18865 YRI
NA18877 YRI
NA18909 YRI
NA19096 YRI
NA19159 YRI
NA19116 YRI
NA19130 YRI
NA19197 YRI
NA19200 YRI
NA19236 YRI
NA19248 YRI
NA19147 YRI
NA19185 YRI
NA18511 YRI
NA18516 YRI
NA18523 YRI
NA18489 YRI
NA18504 YRI
NA18934 YRI
NA18871 YRI
NA18876 YRI
NA18908 YRI
NA18910 YRI
NA18864 YRI
NA18915 YRI
NA19095 YRI
NA19184 YRI
NA19189 YRI
NA19153 YRI
NA19160 YRI
NA19172 YRI
NA19223 YRI
NA19235 YRI
NA19141 YRI
NA19146 YRI
NA19108 YRI
NA19247 YRI
NA19204 YRI
NA19209 YRI
NA18486 YRI
NA18498 YRI
NA18501 YRI
NA18520 YRI
NA18912 YRI
NA18917 YRI
NA18861 YRI
NA18873 YRI
NA18924 YRI
NA18878 YRI
NA19198 YRI
NA19201 YRI
NA19206 YRI
NA19117 YRI
NA19129 YRI
NA19131 YRI
NA19143 YRI
NA19092 YRI
NA19213 YRI
NA19225 YRI
NA19256 YRI
SAMC006354 TIB
SAMC006367 TIB
SAMC006368 TIB
SAMC006369 TIB
SAMC006370 TIB
SAMC006371 TIB
SAMC006374 TIB
SAMC006375 TIB
SAMC006376 TIB
SAMC006377 TIB
SAMC006378 TIB
SAMC006379 TIB
SAMC006380 TIB
SAMC006381 TIB
SAMC006382 TIB
SAMC006385 TIB
SAMC006386 TIB
SAMC006387 TIB
SAMC006389 TIB
WGC025285D TIB
WGC025288D TIB
WGC025289D TIB
WGC025290D TIB
WGC025291D TIB
WGC025292D TIB
WGC025294D TIB
WGC025295D TIB
WGC025296D TIB
WGC025297D TIB
WGC025298D TIB
WGC025299D TIB
WGC025300D TIB
WGC025301D TIB
WGC025302D TIB
WGC025304D TIB
WGC025307D TIB
SAMC006353 TIB
SAMC006355 TIB
SAMC006356 TIB
SAMC006357 TIB
SAMC006358 TIB
SAMC006359 TIB
SAMC006360 TIB
SAMC006361 TIB
SAMC006362 TIB
SAMC006363 TIB
SAMC006364 TIB
SAMC006365 TIB
SAMC006366 TIB
SAMC006372 TIB
SAMC006373 TIB
SAMC006383 TIB
SAMC006384 TIB
SAMC006388 TIB
WGC025265D TIB
WGC025267D TIB
WGC025268D TIB
WGC025269D TIB
WGC025270D TIB
WGC025271D TIB
WGC025274D TIB
WGC025275D TIB
WGC025276D TIB
WGC025277D TIB
WGC025278D TIB
WGC025279D TIB
WGC025280D TIB
WGC025281D TIB
WGC025282D TIB
WGC025283D TIB
WGC025284D TIB
WGC025286D TIB
WGC025287D TIB
WGC025308D TIB
WGC025309D TIB
WGC025310D TIB
WGC025311D TIB
WGC025312D TIB
WGC025313D TIB
WGC025314D TIB
WGC025215D HAN
WGC025216D HAN
WGC025222D HAN
WGC025223D HAN
WGC025224D HAN
WGC025225D HAN
WGC025226D HAN
WGC025229D HAN
WGC025230D HAN
WGC025231D HAN
WGC025232D HAN
WGC025234D HAN
WGC025235D HAN
WGC025241D HAN
WGC025242D HAN
WGC025243D HAN
WGC025244D HAN
WGC025245D HAN
WGC025246D HAN
WGC025247D HAN
WGC025248D HAN
WGC025249D HAN
WGC025250D HAN
WGC025251D HAN
WGC025252D HAN
WGC025255D HAN
WGC025256D HAN
WGC025257D HAN
WGC025258D HAN
WGC025259D HAN
WGC025260D HAN
WGC025261D HAN
WGC025262D HAN
WGC025263D HAN
WGC026449D HAN
WGC026450D HAN
WGC026451D HAN
WGC026453D HAN
WGC026454D HAN
WGC026456D HAN
WGC026457D HAN
WGC026460D HAN
WGC029399D HAN
WGC029400D HAN
WGC029402D HAN
WGC029411D HAN
SAMC006391 HAN
SAMC006392 HAN
SAMC006393 HAN
SAMC006394 HAN
SAMC006395 HAN
SAMC006396 HAN
SAMC006397 HAN
SAMC006398 HAN
SAMC006399 HAN
SAMC006400 HAN
SAMC006401 HAN
SAMC006402 HAN
SAMC006403 HAN
SAMC006404 HAN
SAMC006405 HAN
SAMC006406 HAN
SAMC006407 HAN
SAMC006408 HAN
SAMC006409 HAN
SAMC006410 HAN
SAMC006411 HAN
SAMC006412 HAN
SAMC006413 HAN
SAMC006414 HAN
SAMC006415 HAN
SAMC006416 HAN
SAMC006417 HAN
SAMC006418 HAN
SAMC006419 HAN
SAMC006421 HAN
SAMC006422 HAN
SAMC006423 HAN
SAMC006424 HAN
SAMC006425 HAN
SAMC006426 HAN
SAMC006427 HAN
SAMC006428 HAN
SAMC006429 HAN
Binary file modified pipeline-sv-calling.graffle
Binary file not shown.
Binary file modified pipeline-sv-calling.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
56 changes: 56 additions & 0 deletions pipeline.demographic_inference.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# pipeline for demographic inference and simulation

path_software_easySFS=easySFS.py
path_vcf_include_YRI=example/merge.genotypes.corrected.delCHR.svtk.chr1.vcf
path_samples_cluster=example/samples.YRIandTIBandHAN.clust
root_script=scripts

# 1. construct sfs and proj to a smaller set
$path_software_easySFS \
-p $path_samples_cluster \
-i $path_vcf_include_YRI \
-a -f \
-o YRI_TIB_HAN/sfs.pruned \
--proj 50,40,40 \
--unfolded

# 2. generate bootstrap dataset and calculate GIM
arry_trio=('YRI_TIB_HAN')
arry_model=("sfs.pruned")

for trio in ${arry_trio[@]};do
echo $trio
fname_fs=$(echo $trio | sed 's/_/-/g');
poplist=$(echo $trio | sed 's/_/,/g');
proj='50,40,40'
echo $poplist
echo $proj
for model in ${arry_model[@]};do
echo $model
python $root_script/run_generate_sfs_segments.py \
-d $trio/$model/datadict.txt \
-p $trio/$model/bootstrap \
-l $poplist -u --random \
--projections $proj
done;
done;

# 3. demographic inference
model_list=('split_symmig_all')
for model in ${model_list[@]};do
python $root_script/run_inference.py \
-s YRI_TIB_HAN/YRI-TIB-HAN.sfs \
-p YRI_TIB_HAN/YRI-TIB-HAN -m $model \
--unfolded
done

# 4. simulation
model='sfs.pruned'
path_hapmap=example/hapmap
for label_simulate in `seq 1 1000`; do
for ((i=22;i>=1;i--));do
chrom='chr'$i
echo $chrom
python $root_script/run_simulate_pipeline.py $chrom $label_simulate YRI_TIB_HAN/YRI-TIB-HAN.sfs YRI_TIB_HAN/$model/segments 'split_symmig_all' YRI_TIB_HAN/$model/ normal $path_hapmap
done;
done;
Loading

0 comments on commit efb7f03

Please sign in to comment.