Skip to content

output files

Sina Majidian edited this page Nov 12, 2024 · 8 revisions

 

In the folder 'tests/output' you should be able to find the following folders:

folder/file description
01_ref_ogs_aa contains the selected OGs with amino acid data
01_ref_ogs_dna contains the selected OGs with dna data
02_ref_dna contains the OGs reshuffeled by available species
03_align_aa contains mafft alignment of aa data
03_align_dna contains codon replacement of aa alignments
04_mapping_sample_1 contains the consensus sequences from the mapping
05_ogs_map_sample_1_aa contains the OGs with additional sequence sample_1
05_ogs_map_sample_1_dna contains the OGs with additional sequence sample_1
06_align_sample_1_aa contains the alignment with additional sequence sample_1
06_align_sample_1_dna contains the alignment with additional sequence sample_1
concat_sample_1_aa.phy concatenated alignments from 06 amino acid folder
concat_sample_1_dna.phy concatenated alignments from 06 dna folder
sample_1_all_cov.txt summary of average numbers of reads used for selected sequences
sample_1_all_sc.txt summary of average consensus length of reconstructed sequences

 

You can check the inferred species tree for the sample and five reference species in Newick format:

$cat  output/tree_sample_1.nwk
(sample_1:0.0106979811,((HUMAN:0.0041202790,GORGO:0.0272785216):0.0433094119,(XENLA:0.1715052824,MNELE:0.9177670816):0.1141311779):0.0613339433,RATNO:0.0123413734);

Note that we consider species names as 5-letter codes e.g. XENLA = Xenopus laevis. If you want to rerun your analysis, make sure that you moved/deleted the files. Otherwise, read2tree continues the progress of previous analysis.

For running on clusters, you can run the first step of read2tree such that folders 01, 02 and 03 are computed (this allows for mapping). This can be done using the '--reference' option. Since read2tree re-orders the OGs into the included species, it is possible to split the mapping step per species using multiple threads for the mapper. For this the '--single_mapping' option is available.

Hint: As read2tree exploits the progress package, the user can benefit from continuing unfinished runs. However, if you want to conduct a new analysis with different inputs, you need to remove output of previous runs or change the output_path.

The following is the folder structure for the test example:

Before run:

$ tree tests/
tests/
├── marker_genes
│   ├── OMAGroup_1032177.fa
│   ├── OMAGroup_1059464.fa
│   ├── OMAGroup_1064207.fa
│   ├── OMAGroup_1080404.fa
│   ├── OMAGroup_1103036.fa
│   ├── OMAGroup_1103803.fa
│   ├── OMAGroup_1107105.fa
│   ├── OMAGroup_638532.fa
│   ├── OMAGroup_648288.fa
│   ├── OMAGroup_741671.fa
│   ├── OMAGroup_742036.fa
│   ├── OMAGroup_778504.fa
│   ├── OMAGroup_783172.fa
│   ├── OMAGroup_799356.fa
│   ├── OMAGroup_852256.fa
│   ├── OMAGroup_852375.fa
│   ├── OMAGroup_852570.fa
│   ├── OMAGroup_853308.fa
│   ├── OMAGroup_853454.fa
│   └── OMAGroup_853960.fa
├── sample_1.fastq
├── sample_2.fastq
├── test_aligner.py
├── test_og.py
├── test_ogset.py
├── test_reads.py
├── test_seqCompleteness.py
└── test_use.py
1 directory, 28 files

and after running Read2Tree:

$ tree tests/
tests/
├── marker_genes
│   ├── OMAGroup_1032177.fa
│   ├── OMAGroup_1059464.fa
│   ├── OMAGroup_1064207.fa
│   ├── OMAGroup_1080404.fa
│   ├── OMAGroup_1103036.fa
│   ├── OMAGroup_1103803.fa
│   ├── OMAGroup_1107105.fa
│   ├── OMAGroup_638532.fa
│   ├── OMAGroup_648288.fa
│   ├── OMAGroup_741671.fa
│   ├── OMAGroup_742036.fa
│   ├── OMAGroup_778504.fa
│   ├── OMAGroup_783172.fa
│   ├── OMAGroup_799356.fa
│   ├── OMAGroup_852256.fa
│   ├── OMAGroup_852375.fa
│   ├── OMAGroup_852570.fa
│   ├── OMAGroup_853308.fa
│   ├── OMAGroup_853454.fa
│   └── OMAGroup_853960.fa
├── mplog.log
├── output
│   ├── 01_ref_ogs_aa
│   │   ├── OG1032177.fa
│   │   ├── OG1059464.fa
│   │   ├── OG1064207.fa
│   │   ├── OG1080404.fa
│   │   ├── OG1103036.fa
│   │   ├── OG1103803.fa
│   │   ├── OG1107105.fa
│   │   ├── OG638532.fa
│   │   ├── OG648288.fa
│   │   ├── OG741671.fa
│   │   ├── OG742036.fa
│   │   ├── OG778504.fa
│   │   ├── OG783172.fa
│   │   ├── OG799356.fa
│   │   ├── OG852256.fa
│   │   ├── OG852375.fa
│   │   ├── OG852570.fa
│   │   ├── OG853308.fa
│   │   ├── OG853454.fa
│   │   └── OG853960.fa
│   ├── 01_ref_ogs_dna
│   │   ├── OG1032177.fa
│   │   ├── OG1059464.fa
│   │   ├── OG1064207.fa
│   │   ├── OG1080404.fa
│   │   ├── OG1103036.fa
│   │   ├── OG1103803.fa
│   │   ├── OG1107105.fa
│   │   ├── OG638532.fa
│   │   ├── OG648288.fa
│   │   ├── OG741671.fa
│   │   ├── OG742036.fa
│   │   ├── OG778504.fa
│   │   ├── OG783172.fa
│   │   ├── OG799356.fa
│   │   ├── OG852256.fa
│   │   ├── OG852375.fa
│   │   ├── OG852570.fa
│   │   ├── OG853308.fa
│   │   ├── OG853454.fa
│   │   └── OG853960.fa
│   ├── 02_ref_dna
│   │   ├── GORGO_OGs.fa
│   │   ├── HUMAN_OGs.fa
│   │   ├── MNELE_OGs.fa
│   │   ├── RATNO_OGs.fa
│   │   └── XENLA_OGs.fa
│   ├── 03_align_aa
│   │   ├── OG1032177.phy
│   │   ├── OG1059464.phy
│   │   ├── OG1064207.phy
│   │   ├── OG1080404.phy
│   │   ├── OG1103036.phy
│   │   ├── OG1103803.phy
│   │   ├── OG1107105.phy
│   │   ├── OG638532.phy
│   │   ├── OG648288.phy
│   │   ├── OG741671.phy
│   │   ├── OG742036.phy
│   │   ├── OG778504.phy
│   │   ├── OG783172.phy
│   │   ├── OG799356.phy
│   │   ├── OG852256.phy
│   │   ├── OG852375.phy
│   │   ├── OG852570.phy
│   │   ├── OG853308.phy
│   │   ├── OG853454.phy
│   │   └── OG853960.phy
│   ├── 03_align_dna
│   │   ├── OG1032177.phy
│   │   ├── OG1059464.phy
│   │   ├── OG1064207.phy
│   │   ├── OG1080404.phy
│   │   ├── OG1103036.phy
│   │   ├── OG1103803.phy
│   │   ├── OG1107105.phy
│   │   ├── OG638532.phy
│   │   ├── OG648288.phy
│   │   ├── OG741671.phy
│   │   ├── OG742036.phy
│   │   ├── OG778504.phy
│   │   ├── OG783172.phy
│   │   ├── OG799356.phy
│   │   ├── OG852256.phy
│   │   ├── OG852375.phy
│   │   ├── OG852570.phy
│   │   ├── OG853308.phy
│   │   ├── OG853454.phy
│   │   └── OG853960.phy
│   ├── 04_mapping_sample_1
│   │   ├── GORGO_OGs_consensus.fa
│   │   ├── GORGO_OGs_cov.txt
│   │   ├── GORGO_OGs.fa.bam
│   │   ├── GORGO_OGs_sc.txt
│   │   ├── HUMAN_OGs_consensus.fa
│   │   ├── HUMAN_OGs_cov.txt
│   │   ├── HUMAN_OGs.fa.bam
│   │   ├── HUMAN_OGs_sc.txt
│   │   ├── MNELE_OGs_cov.txt
│   │   ├── MNELE_OGs.fa.bam
│   │   ├── RATNO_OGs_consensus.fa
│   │   ├── RATNO_OGs_cov.txt
│   │   ├── RATNO_OGs.fa.bam
│   │   ├── RATNO_OGs_sc.txt
│   │   ├── XENLA_OGs_consensus.fa
│   │   ├── XENLA_OGs_cov.txt
│   │   ├── XENLA_OGs.fa.bam
│   │   └── XENLA_OGs_sc.txt
│   ├── 05_ogs_map_sample_1_aa
│   │   ├── OG1032177.fa
│   │   ├── OG1059464.fa
│   │   ├── OG1064207.fa
│   │   ├── OG1080404.fa
│   │   ├── OG1103036.fa
│   │   ├── OG1103803.fa
│   │   ├── OG1107105.fa
│   │   ├── OG638532.fa
│   │   ├── OG648288.fa
│   │   ├── OG741671.fa
│   │   ├── OG742036.fa
│   │   ├── OG778504.fa
│   │   ├── OG783172.fa
│   │   ├── OG799356.fa
│   │   ├── OG852256.fa
│   │   ├── OG852375.fa
│   │   ├── OG852570.fa
│   │   ├── OG853308.fa
│   │   ├── OG853454.fa
│   │   └── OG853960.fa
│   ├── 05_ogs_map_sample_1_dna
│   │   ├── OG1032177.fa
│   │   ├── OG1059464.fa
│   │   ├── OG1064207.fa
│   │   ├── OG1080404.fa
│   │   ├── OG1103036.fa
│   │   ├── OG1103803.fa
│   │   ├── OG1107105.fa
│   │   ├── OG638532.fa
│   │   ├── OG648288.fa
│   │   ├── OG741671.fa
│   │   ├── OG742036.fa
│   │   ├── OG778504.fa
│   │   ├── OG783172.fa
│   │   ├── OG799356.fa
│   │   ├── OG852256.fa
│   │   ├── OG852375.fa
│   │   ├── OG852570.fa
│   │   ├── OG853308.fa
│   │   ├── OG853454.fa
│   │   └── OG853960.fa
│   ├── 06_align_sample_1_aa
│   │   ├── OG1032177.fa
│   │   ├── OG1059464.fa
│   │   ├── OG1064207.fa
│   │   ├── OG1080404.fa
│   │   ├── OG1103036.fa
│   │   ├── OG1103803.fa
│   │   ├── OG1107105.fa
│   │   ├── OG638532.fa
│   │   ├── OG648288.fa
│   │   ├── OG741671.fa
│   │   ├── OG742036.fa
│   │   ├── OG778504.fa
│   │   ├── OG783172.fa
│   │   ├── OG799356.fa
│   │   ├── OG852256.fa
│   │   ├── OG852375.fa
│   │   ├── OG852570.fa
│   │   ├── OG853308.fa
│   │   ├── OG853454.fa
│   │   └── OG853960.fa
│   ├── 06_align_sample_1_dna
│   │   ├── OG1032177.fa
│   │   ├── OG1059464.fa
│   │   ├── OG1064207.fa
│   │   ├── OG1080404.fa
│   │   ├── OG1103036.fa
│   │   ├── OG1103803.fa
│   │   ├── OG1107105.fa
│   │   ├── OG638532.fa
│   │   ├── OG648288.fa
│   │   ├── OG741671.fa
│   │   ├── OG742036.fa
│   │   ├── OG778504.fa
│   │   ├── OG783172.fa
│   │   ├── OG799356.fa
│   │   ├── OG852256.fa
│   │   ├── OG852375.fa
│   │   ├── OG852570.fa
│   │   ├── OG853308.fa
│   │   ├── OG853454.fa
│   │   └── OG853960.fa
│   ├── concat_sample_1_aa.phy
│   ├── concat_sample_1_dna.phy
│   ├── sample_1_all_cov.txt
│   ├── sample_1_all_sc.txt
│   └── tree_sample_1.nwk
├── sample_1.fastq
├── sample_2.fastq
├── test_aligner.py
├── test_og.py
├── test_ogset.py
├── test_reads.py
├── test_seqCompleteness.py
└── test_use.py
12 directories, 217 files

Content of some files for another run/dataset:

$ cat  04_mapping_ERR7323296__1/X0030_OGs_cov.txt
#species,og,gene_id,coverage,std
X0030,OG25,X003000004,nan,nan
X0030,OG14,X003000003,5259.72,3795.315
X0030,OG5,X003000001,812.38,2150.586
X0030,OG1,X003000002,2738.45,3677.116
X0030,OG84,X003000005,5287.19,490.546
X0030,OG28,X003000006,11.95,6.466
X0030,OG20,X003000007,4440.42,3225.788


$ less 05_ogs_map_ERR7323296__1_aa/OG25.fa 
>CVHSA00003_OG25 CVHSA00003 | OMA25 | P59632 | [Human SARS coronavirus]
MDLFMRFFTLRSITAQPVKIDNASPASTVHATATIPLQASLPFGWLVIGVAFLAVFQSATKIIALNKRWQLALYKGFQFICNLLLLFVTIYSHLLLVAAGMEAQFLYLYALIYFLQCINACRIIMRCWLCWKCKSKNPLLYDANYFVCWHTHNYDYCIPYNSVTDTIVVTEGDGISTPKLKEDYQIGGYSEDRHSGVKDYVVVHGYFTEVYYQLESTQITTDTGIENATFFIFNKLVKDPPNVQIHTIDGSSGVANPAMDPIYDEPTTTTSVPL
>ERR7323296__1 SARS200003_OG25 [SARS2]
MDLFMRIFTIGTVTLKQGEIKDATPLDFVRATATIPIQASLPFGWLIVGVALLAVFQSASKIITLKKRWQLALSKGVHFVCNLLLLFVTVYSHLLLVAAGLEAPFLYLYALVYFLQSINFVRIIMRLWLCWKCRSKNPLLYDANYFLCWHTNCYDYCIPYNSVTSSIVITSGDGTTSPISEHDYQIGGYTEKWESGVKDCVVLHSYFTSDYYQLYSTQLSTDTGVEHVTFFIYNKIVDEPEEHVQIHTIDGSSGVVNPVMEPIYDEPTTTTSVPL

$ less  06_align_merge_aa/OG25.fa 
 18 275
X0012          MDLFMSIFTL GSITRQPSKI ENAFLASTVH ATATIPLQAS FSFRWLVVGV
X0030          MDLFMSIFTL GAITRQPAKI ENASPASTVH ATATIPLQAS LPFGWLVVGV
X0019          MDLFMSIFTL GSITRQPSKI ENAFLASTVH ATATIPLQAS LSFRWLVAGV
X0041          MDLFMSIFTL GSITRQPSKI ENAFLASTVH ATATIPLQAS FSFRWLVIGV
X0018          MDLFMRFFTL RSITAQPVKI DNASPASTVH ATATIPLQAS LPFGWLVIGV
X0069          MDLFMRFFTL GSITAQPVKI DNASPASTVH ATATIPLQAS LPFGWLVIGV
SARS2          MDLFMRIFTI GTVTLKQGEI KDATPSDFVR ATATIPIQAS LPFGWLIVGV
CVHSA          MDLFMRFFTL RSITAQPVKI DNASPASTVH ATATIPLQAS LPFGWLVIGV
ERR7361530__1  MDLFMRIFTI GTVTLKQGEI KDATPLDFVR ATATIPIQAS LPFGWLIVGV
ERR7359642__1  MDLFMRIFTI GTVTLKQGEI KDATPLDFVR ATATIPIQAS LPFGWLIVGV
ERR7350657__1  MDLFMRIFTI GTVTLKQGEI KDATPLDFVR ATATIPIQAS LPFGWLIVGV
ERR7332798__1  MDLFMRIFTI GTVTLKQGEI KDATPLDFVR ATATIPIQAS LPFGWLIVGV
ERR7323296__1  MDLFMRIFTI GTVTLKQGEI KDATPLDFVR ATATIPIQAS LPFGWLIVGV
ERR7373660__1  MDLFMRIFTI GTVTLKQGEI KDATPLDFVR ATATIPIQAS LPFGWLIVGV
ERR7324472__1  MDLFMRIFTI GTVTLKQGEI KDATPLDFVR ATATIPIQAS LPFGWLIVGV
ERR7361581__1  MDLFMRIFTI GTVTLKQGEI KDATPLDFVR ATATIPIQAS LPFGWLIVGV
ERR7347075__1  MDLFMRIFTI GTVTLKQGEI KDATPLDFVR ATATIPIQAS LPFGWLIVGV
ERR7330527__1  MDLFMRIFTI GTVTLKQGEI KDATPLDFVR ATATIPIQAS LPFGWLIVGV
Clone this wiki locally