Output Files Description

Output files, either in the cloud or run locally, will be in files separate files for each task:

call-grabinput
call-BLAST
call-findThresh
call-generateReport
call-MSA

For the call-BLAST and call-grabinput steps, each of the outputs will be in individual folders named shard-<id_number>. The shards represent that the job was run in parallel.

The output directory in the cloud will look like this:

Locally your output directory will look something similar to this:

(base) jeff@jeff-OptiPlex-7060:~/Desktop/run_aces/ACES/ACES_wdl_workflow/cromwell-executions/aces/b1a94f47-37ac-490a-bfc0-fa52ba49c22c$ ls 
call-BLAST  call-findThresh  call-generateReport  call-grabinput  call-MSA

Output description

Here we will detail all of the output created by the workflow.

call-grabinput

<samples>.files.txt

This is used as input for the BLAST call. It provides a list of files for Cromwell to localize when running BLAST.

call-BLAST

<sample>_blast_results.txt

This is the blast result coming straight from BLAST run, with all options set.

<sample>_parsed.fa

This is a parsed version of the output, to be used as input for the following steps.

call-generateReport

Files_Generated_Report.txt Report of all files created from BLAST and if their hits were significant or not. Files_Generated_Report.txt is a collection of parsed files that have and have not met the user's threshold e-value requirement, and if the sequence has met the query length minimum.

This file contains a key for symbols used inside the document and sequence IDs with which file they were found in.

Key:

N/L : Sequence Length Did Not Meet % Of Query Length Requirement
N/H : E-Value Did Not Meet Threshold Requirements
N/A : No Hits Were Found In This Genome
@- : Sequence Is From Ensembl Database

Below the key will be a summary of the results against applied requirements.

Example:

['#']: Threshold Value Used
#: Total # of Files
#: Total # Of Sequences Found
#: Total # Of Rejected Sequences Found
#: Total # Of Accepted Sequences Found

Finally, sequences are organized into 'Rejected' and 'Accepted' categories. Users can see which sequences, from which genomic file, have met the e-value threshold requirement, as well as the query length minimum requirement. These sequences in the 'Accepted' categories are those used to generate results.

call-findThres

Keydoc.txt

This file shows what the short form name of each sample name corresponds to.

Parsed_Final.fa

This is a collection of all the parsed BLAST output that passed the e-value and query length threshold. In the FASTA headers for each sequence, the full name is used.

Parsed_Final.fa_smallname.fa

This is a collection of all the parsed BLAST output that passed the e-value and query length threshold. In the FASTA headers for each sequence, a shortened unique name is used. This is used in the MUSCLE step, as it only accepts the first 10 characters of each header and they cannot be the same.

call-MSA

Multi_Seq_Align.aln

Multiple sequence alignment of all BLAST output that passed the e-value and query length sequence minimum.

Phy_Align.phy

Is Multi_Seq_Align.aln file is in PHYLIP format.

MSA2GFA.gfa

Multiple sequence alignment in GFA format.

RAxML output

RAxML output produces a best tree file, called RAxML_bestTree.RAXML_output.

Other output include:

RAxML_bipartitions.RAXML_output
RAxML_bipartitionsBranchLabels.RAXML_output
RAxML_bootstrap.RAXML_output
RAxML_info.RAXML_output

Provide feedback

Saved searches

Use saved searches to filter your results more quickly