DATED provides an efficient single step solution to estimate the level of synonymous substitution (Ks) between paralogous and orthologous sequence pairs. The software utilizes multiprocessing library to speedup Ks calculation for input sequence pairs. The output from the pipeline can be used in the divergence.R
script to perform mixture model analysis of Ks distribution.
Create and activate a Conda environment named Ks
conda create --name ks python=3.5
conda activate ks
Downloading and installing software:
- DATED
git clone https://github.com/ChuShin/dated
- ClustalW
conda install -c bioconda clustalw
- PAL2NAL
conda install -c bioconda pal2nal
- PAML
conda install -c bioconda paml
An all-against-all protein sequence similarity (BLASTP with E-value, high-scoring segment pair (HSP) length and sequence identify cut-offs) search can be used to identify paralogous genes within a plant species for which completely annotated genome sequence is available. In the absence of a completely annotated genome sequence, transcript sequences assembled from RNA-Seq data can be used to identify homologs. In such case, the open reading frame for each transcript has to be predicted and corresponding translated amino acid sequence should be deduced.
Reciprocal best blast hit method can be used to detect orthologous genes between two related species.
dated.py pep.fa cds.fa blast_pairlist > blast_pairlist.ks
Identification of paralogs in soybean:
-
Create a custom blast database:
makeblastdb -in Glycine_max.Glycine_max_v2.1.pep.all.fa -parse_seqids -dbtype prot
-
Perform an all-against-all blastp search:
blastp –query Glycine_max.Glycine_max_v2.1.pep.all.fa –out Gm_Gm_paralogs.out –db Glycine_max.Glycine_max_v2.1.pep.all.fa –outfmt6
-
Identify paralogs (Sequences aligned over >150 aa and showing at least 60% identity are defined as paralogs):
bp_parse_blastp.pl Gm_Gm_paralogs.out > Gm_Gm_paralogs.parsed
-
Ks estimation of paralogs:
dated.py Glycine_max.Glycine_max_v2.1.pep.all.fa Glycine_max.Glycine_max_v2.1.cds.all.fa Gm_Gm_paralogs.parsed > Gm_Gm_paralogs.parsed.ks