phac-nml/gasnomenclature: Output

Introduction

This document describes the output produced by the pipeline.

The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.

append: Contains reference MLST profile and cluster address files if additional databases were provided by the user.
call: The cluster addresses from the genomic_address_service.
cluster: The cluster file required by GAS_call.
distances: Distances between genomes from profile_dists.
filter: The cluster addresses from only the query samples.
input: An error report that is only generated when sample IDs and MLST JSON files do not match.
locidex: The merged MLST JSON files for reference and query samples.
pipeline_info: Information about the pipeline's execution

The IRIDA Next-compliant JSON output file will be named iridanext.output.json.gz and will be written to the top-level of the results directory. This file is compressed using GZIP and conforms to the IRIDA Next JSON output specifications.

Pipeline overview

The pipeline is built using Nextflow and processes data using the following steps:

Input assure - Performs a validation check on the samplesheet inputs to ensure that the sampleID precisely matches the MLST JSON key and enforces necessary changes where discrepancies are found.
Locidex merge - Merges MLST profile JSON files into a single profiles file for reference and query samples.
Append profiles - Appends additional MLST profile information to reference samples if provided by user.
Profile dists - Computes pairwise distances between genomes using MLST allele differences.
Cluster file - Generates the expected_clusters.txt file from reference sample addresses for use in GAS_call.
Append clusters - Appends additional cluster information to reference samples if provided by user.
GAS call - Generates hierarchical cluster addresses.
Filter query - Filters and generates a csv file containing only the cluster addresses for query samples.
IRIDA Next Output - Generates a JSON output file that is compliant with IRIDA Next
Pipeline information - Report metrics generated during the workflow execution

Input Assure

Output files

input/
- sampleID_error_report.csv
- sampleID.mlst.json.gz

Locidex merge

Output files

locidex/merge/
- reference samples: reference/merged_ref/merged_profiles_ref.tsv
- query samples: query/merged_value/merged_profile_value.tsv

Append Profiles

Output files

append/
- profiles: profiles_ref.tsv

Profile Dists

Output files

distances/
- Mapping allele identifiers to integers: allele_map.json
- The query MLST profiles: query_profile.text
- The reference MLST profiles: ref_profile.text
- The computed distances based on MLST allele differences: results.text
- Information on the profile_dists run: run.json

Cluster File

Output files

cluster/
- expected_clusters.txt

Append Clusters

Output files

append/
- clusters: reference_clusters.tsv

GAS call

Output files

call/
- The computed cluster addresses: clusters.text
- Information on the GAS mcluster run: run.json
- Thesholds used to compute cluster addresses: thresholds.json

Filter Query

Output files

filter/
- new_addresses.tsv

Pipeline information

Output files

pipeline_info/
- Reports generated by Nextflow: execution_report.html, execution_timeline.html, execution_trace.txt and pipeline_dag.dot/pipeline_dag.svg.
- Reports generated by the pipeline: pipeline_report.html, pipeline_report.txt and software_versions.yml. The pipeline_report* files will only be present if the --email / --email_on_fail parameter's are used when running the pipeline.
- Reformatted samplesheet files used as input to the pipeline: samplesheet.valid.csv.
- Parameters used by the pipeline run: params.json.

Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

output.md

output.md

phac-nml/gasnomenclature: Output

Introduction

Pipeline overview

Input Assure

Locidex merge

Append Profiles

Profile Dists

Cluster File

Append Clusters

GAS call

Filter Query

Pipeline information

Files

output.md

Latest commit

History

output.md

File metadata and controls

phac-nml/gasnomenclature: Output

Introduction

Pipeline overview

Input Assure

Locidex merge

Append Profiles

Profile Dists

Cluster File

Append Clusters

GAS call

Filter Query

Pipeline information