-
Notifications
You must be signed in to change notification settings - Fork 0
Output Files Description
Output files, either in the cloud or run locally, will be in files separate files for each task:
- call-grabinput
- call-BLAST
- call-findThresh
- call-generateReport
- call-MSA
For the call-BLAST and call-grabinput steps, each of the outputs will be in individual folders named shard-<id_number>
. The shards represent that the job was run in parallel.
The output directory in the cloud will look like this:
Locally your output directory will look something similar to this:
(base) jeff@jeff-OptiPlex-7060:~/Desktop/run_aces/ACES/ACES_wdl_workflow/cromwell-executions/aces/b1a94f47-37ac-490a-bfc0-fa52ba49c22c$ ls
call-BLAST call-findThresh call-generateReport call-grabinput call-MSA
Here we will detail all of the output created by the workflow.
<samples>.files.txt
This is used as input for the BLAST call. It provides a list of files for Cromwell to localize when running BLAST.
<sample>_blast_results.txt
This is the blast result coming straight from BLAST run, with all options set.
<sample>_parsed.fa
This is a parsed version of the output, to be used as input for the following steps.
-
Files_Generated_Report.txt
Report of all files created from BLAST and if their hits were significant or not. Files_Generated_Report.txt is a collection of parsed files that have and have not met the user's threshold e-value requirement, and if the sequence has met the query length minimum.
This file contains a key for symbols used inside the document and sequence IDs with which file they were found in.
Key:
-
N/L : Sequence Length Did Not Meet % Of Query Length Requirement
-
N/H : E-Value Did Not Meet Threshold Requirements
-
N/A : No Hits Were Found In This Genome
-
@- : Sequence Is From Ensembl Database
Below the key will be a summary of the results against applied requirements.
Example:
['#']: Threshold Value Used
#: Total # of Files
#: Total # Of Sequences Found
#: Total # Of Rejected Sequences Found
#: Total # Of Accepted Sequences Found
Finally, sequences are organized into 'Rejected' and 'Accepted' categories. Users can see which sequences, from which genomic file, have met the e-value threshold requirement, as well as the query length minimum requirement. These sequences in the 'Accepted' categories are those used to generate results.
Keydoc.txt
This file shows what the short form name of each sample name corresponds to.
Parsed_Final.fa
This is a collection of all the parsed BLAST output that passed the e-value and query length threshold. In the FASTA headers for each sequence, the full name is used.
Parsed_Final.fa_smallname.fa
This is a collection of all the parsed BLAST output that passed the e-value and query length threshold. In the FASTA headers for each sequence, a shortened unique name is used. This is used in the MUSCLE step, as it only accepts the first 10 characters of each header and they cannot be the same.
Multi_Seq_Align.aln
Multiple sequence alignment of all BLAST output that passed the e-value and query length sequence minimum.
Phy_Align.phy
Is Multi_Seq_Align.aln file is in PHYLIP format.
MSA2GFA.gfa
Multiple sequence alignment in GFA format.
RAxML output produces a best tree file, called RAxML_bestTree.RAXML_output
.
Other output include:
RAxML_bipartitions.RAXML_output
RAxML_bipartitionsBranchLabels.RAXML_output
RAxML_bootstrap.RAXML_output
RAxML_info.RAXML_output