Skip to content

Latest commit

 

History

History
201 lines (188 loc) · 7.46 KB

README.md

File metadata and controls

201 lines (188 loc) · 7.46 KB

Jaltomata Phylogenomics

Table of Contents

Overview

Contributors

Raw Data Processing

Trim low-quality reads and the first 15-bp of reads due to non-random hexamer primers
qsub trim.sh
qsub clip5end.sh
qsub FastaQC.sh
Build transcript assembly and predict CDS using Transdecoder
qsub trinity.sh
qsub transdecoder.sh
Rename CDS files and reduce redundancy
for file in *_dir; do cp $file/longest_orfs.cds outDIR/$file'.cds'; done
python fix_names_from_transdecoder.py <DIR> <DIR>
cat *_NR.fa *_RP.fa > *_cds.fa
qsub cd-hit-est.sh

Homolog Inference

Make all-by-all blast and Infer putative homolog groups using similarity
qsub blastn.sh
cat *blastn > all.rawblast
python blast_to_mcl.py all.rawblast <hit_fraction_cutoff>
mcl all.rawblast.hit-frac0.4.minusLogEvalue --abc -te 5 -tf 'gq(10)' -I 2.5 -o hit-frac0.4_I2.5_e10
python write_fasta_files_from_mcl.py <fasta files> <mcl_outfile> <minimal_ingroup_taxa> <outDIR>
Generate initial alignments and then cut long internal branch
qsub mafft.sh
qsub phyutility.sh
qsub fasttree.sh
python cut_long_branches_iter.py <inDIR> <outDIR>
Refine the final clusters
qsub mafft.sh
qsub phyutility.sh
qsub raxml.sh
Cut long internal branches, trim spurious tips and mask monophyletic/paraphyletic tips of the same taxon
python cut_long_internal_branches.py <inDIR> <internal_branch_length_cutoff> <minimal_taxa> <outDIR>
python trim_tips.py <treDIR> <outDIR> <relative_cutoff> <absolute_cutoff1> <absolute_cutoff2>
python mask_tips_by_taxonID_transcripts.py <treDIR> <aln-clnDIR> <outDIR>

Ortholog Inference

Paralogy pruning to infer orthologs
python prune_paralogs_MI.py <homologDIR> <tree_ending> <relative_tip_cutoff> <absolute_tip_cutoff> <minimal_taxa> <outDIR>
python write_ortholog_fasta_files.py <fasta file with all seqs> <ortholog tree DIR> <outDIR> <MIN_TAXA>
Rename the sequence files based on Tomato Gene Model and add Capsella orthologous sequences
python cluster_gene_ID.py <inDIR> <treDIR> <outDIR>
python CapsellaOrtholog.py <inDIR> Tomato_Capsella.txt Capsicum.annuum.L_Zunla-1_v2.0_CDS.fa <outDIR>

Alignment Construction

Run Guidance to make sequence alignments
python directory_subpackage.py <inDIR> <num_subdir> .fa
qsub guidance.sh
Re-run Guidance on unprocessed sequences
for file in Solyc*; do cp $file/MSA.PRANK.Without_low_SP_Col.With_Names outDIR/$file; done
python find_unprocessed_files.py <processedDIR> <originalDIR> <unprocessedDIR>
Post-alignment treatment_1, remove Capsella sequences and delete gaps or missing bases
qsub mask_bySW.sh
python orf_aln_process.py <inDIR> <outDIR> -s Capana -d 15
Calculate pair-wise genetic distance
python3.3 fasta2mvf.py --fasta alignments_Dir/* --out genes_mvf --contigbyfile --overwrite
python3.3 mvf_analyze_dna.py --mvf genes_mvf --out genetic_dist PairwiseDistanceWindow

Phylogeny Construction

Concatenated tree and Consensus tree using RAxML

qsub raxml_concatenate.sh
module load phylip; consense
raxmlHPC -L MRE -z genetrees.tre -m GTRCAT -n T1

Colascence tree by ASTRAL

qsub astral.sh

Gene tree analysis with BUCKy

python seqformat_converter.py <inDIR> <outDIR> .phy .nex
qsub bucky.sh

Visualize phylogenetic tree

rstrip phylo_construct.R

Introgression Analysis

Run ABBA using MVF
python3.3 fasta2mvf.py --fasta <concatenated_fasta> --out transcriptome --overwrite
python ABBA_trio.py
qsub introgression_trios.sh
Infer pairwise species-specific/common ABBA-BABA sites
python ABBA_parse.py -mvf MVF_FILE -test pairwise
sh speciesID.sh
Infer direction of introgression by using D-foil test (example)
python3.3 mvf_analyze_dna.py --mvf transcriptome --out SIN_CAL_DAR_PRO --samples JA0702 JA0711 JA0694 JA0456 Solyc --windowsize 6201996 PatternCount
python dfoil.py --out myfile --infile SIN_CAL_DAR_PRO —pvalue 0.00001

Ancestral Segregating Allele Analysis

Mapping reads to tomato reference genome and call SNPs
qsub mapping.sh
qsub snp_call.sh
python mvf_join.py --mvf SL2.50ch00.mvf SL2.50ch01.mvf SL2.50ch02.mvf SL2.50ch03.mvf SL2.50ch04.mvf SL2.50ch05.mvf SL2.50ch06.mvf SL2.50ch07.mvf SL2.50ch08.mvf SL2.50ch09.mvf SL2.50ch10.mvf SL2.50ch11.mvf SL2.50ch12.mvf --out combined.mvf
Count ancestral segregating alleles
python ancestral_variation.py -i comibined.mvf -t species_hetero
python ancestral_variation.py -i comibined.mvf -t shared_hetero
python ancestral_variation.py -i comibined.mvf -t shared_snp
Count how many sites are discordant with the representative topology using a BBAA-ABBA-BABA test
qsub ILS_trios.sh

Adaptive Evolution Analysis

Separate alignments with/without Capana and remove JA0010 from alignments
python orf_aln_process.py -i <inDIR> -o <outDIR> -s JA0010
grep -lir 'Capana' ./ | xargs mv -t <outDIR>
python seqformat_converter.py <inDIR> <outDIR> .fa .phy
sh edit_phy2.sh
Post-alignment treatment_2
python codemlScript.py <outDIR> <codeml_build> <treeFile>
qsub paml.sh
find */rub -empty -type f
python SWAMP.py -i <inDIR> -b <branchcodes.txt> -t 5 -w 15 -m 50
Remove all gaps and missing bases before PAML
for file in Solyc*; do cp inDIR/*masked.phy outDIR; done
python orf_aln_process.py -i <inDIR> -o <outDIR> -s seqname -d 14
Run PAML using MVF
python3.3 fasta2mvf.py --fasta inDIR/* --out outDIR/Jalt_ortho_dna --contigbyfile --overwrite
python3.3 mvf_translate.py --mvf Jalt_ortho_dna --out Jalt_ortho_codon
qsub mvf_paml.sh
python CombinedPAML.py <NS_out> <Geneoutput> GeneFunction.txt > PAML_final.txt
Perform PhylogGWAS analysis on the derived floral traits in Jaltomata (nectar)
python3.3 mvf_analyze_codon.py GroupUniqueAlleleWindow --mvf Jalt_noSolyc_codon --out Jalt_nectar --allelegroups RED:JA0432,JA0608,JA0719,JA0726,JA0816,JA0711,JA0798 OTHER:JA0456,JA0701,JA0694,JA0450,JA0723,JA0702 --windowsize -1 --uselabels --speciesgroups PRO:JA0456 REP:JA0701 DAR:JA0694 AUR:JA0450 UMB:JA0432 BIF:JA0608 SIN:JA0702 DEN:JA0719 YUN:JA0723 AIJ:JA0726 INC:JA0816 CAL:JA0711 QUI:JA0798 --branchlrt Geneoutput_nectar --pamltmp PAMLtemp_nectar --startcontig 0 --endcontig 0 --target JA0432 JA0608 JA0719 JA0726 JA0816 JA0711 JA0798 --targetspec 8 --raxmlpath raxmlHPC --allsampletree
qsub ms_sim.sh