Skip to content

Output Files Description

jng2 edited this page Sep 17, 2021 · 5 revisions

Output files, either in the cloud or run locally, will be in files separate files for each task:

  • call-grabinput
  • call-BLAST
  • call-findThresh
  • call-generateReport
  • call-MSA

For the call-BLAST and call-grabinput steps, each of the outputs will be in individual folders named shard-<id_number>. The shards represent that the job was run in parallel.

The output directory in the cloud will look like this:

Locally your output directory will look something similar to this:

(base) jeff@jeff-OptiPlex-7060:~/Desktop/run_aces/ACES/ACES_wdl_workflow/cromwell-executions/aces/b1a94f47-37ac-490a-bfc0-fa52ba49c22c$ ls 
call-BLAST  call-findThresh  call-generateReport  call-grabinput  call-MSA 

Output description

Here we will detail all of the output created by the workflow.

call-grabinput

  • <samples>.files.txt

This is used as input for the BLAST call. It provides a list of files for Cromwell to localize when running BLAST.

call-BLAST

  • <sample>_blast_results.txt

This is the blast result coming straight from BLAST run, with all options set.

  • <sample>_parsed.fa

This is a parsed version of the output, to be used as input for the following steps.

call-generateReport

  • Files_Generated_Report.txt Report of all files created from BLAST and if their hits were significant or not. Files_Generated_Report.txt is a collection of parsed files that have and have not met the user's threshold e-value requirement, and if the sequence has met the query length minimum.

This file contains a key for symbols used inside the document and sequence IDs with which file they were found in.

Key:

  • N/L : Sequence Length Did Not Meet % Of Query Length Requirement

  • N/H : E-Value Did Not Meet Threshold Requirements

  • N/A : No Hits Were Found In This Genome

  • @- : Sequence Is From Ensembl Database

Below the key will be a summary of the results against applied requirements.

Example:

['#']: Threshold Value Used
#: Total # of Files
#: Total # Of Sequences Found
#: Total # Of Rejected Sequences Found
#: Total # Of Accepted Sequences Found

Finally, sequences are organized into 'Rejected' and 'Accepted' categories. Users can see which sequences, from which genomic file, have met the e-value threshold requirement, as well as the query length minimum requirement. These sequences in the 'Accepted' categories are those used to generate results.

call-findThres

  • Keydoc.txt

This file shows what the short form name of each sample name corresponds to.

  • Parsed_Final.fa

This is a collection of all the parsed BLAST output that passed the e-value and query length threshold. In the FASTA headers for each sequence, the full name is used.

  • Parsed_Final.fa_smallname.fa

This is a collection of all the parsed BLAST output that passed the e-value and query length threshold. In the FASTA headers for each sequence, a shortened unique name is used. This is used in the MUSCLE step, as it only accepts the first 10 characters of each header and they cannot be the same.

call-MSA

  • Multi_Seq_Align.aln

Multiple sequence alignment of all BLAST output that passed the e-value and query length sequence minimum.

  • Phy_Align.phy

Is Multi_Seq_Align.aln file is in PHYLIP format.

  • MSA2GFA.gfa

Multiple sequence alignment in GFA format.

RAxML output

RAxML output produces a best tree file, called RAxML_bestTree.RAXML_output.

Other output include:

  • RAxML_bipartitions.RAXML_output
  • RAxML_bipartitionsBranchLabels.RAXML_output
  • RAxML_bootstrap.RAXML_output
  • RAxML_info.RAXML_output