Skip to content

Latest commit

 

History

History
123 lines (86 loc) · 4.62 KB

output.md

File metadata and controls

123 lines (86 loc) · 4.62 KB

phac-nml/fastmatchirida: Output

Introduction

This document describes the output produced by the pipeline.

The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.

  • append: The passed metadata to the pipeline appended to sample-sample distance pairings.
  • distances: Distances between genomes from profile_dists.
  • input: MLST JSON files processed to ensure that the sample ID provided in the sample sheet matches the IDs provided in the MLST JSON file.
  • merged: The merged MLST JSON files into a single MLST profiles file.
  • pipeline_info: Information about the pipeline's execution.
  • process: Processed sample-sample distance pairings.

The IRIDA Next-compliant JSON output file will be named iridanext.output.json.gz and will be written to the top-level of the results directory. This file is compressed using GZIP and conforms to the IRIDA Next JSON output specifications.

Pipeline overview

The pipeline is built using Nextflow and processes data using the following steps:

  • Input Assure - Assures that the sample IDs provided in the sample sheet match the IDs provided in the MLST JSON files associated with each sample.
  • Locidex Merge Query - Merges query MLST profile JSON files into a single profiles file.
  • Locidex Merge References - Merges reference MLST profile JSON files into a single profiles file.
  • Profile Dists - Computes pairwise distances between genomes using MLST allele differences.
  • Append Metadata - Appends the passed input metadata to the pairwise distances.
  • Process Output - Processes sample-sample distance pairings by distance threshold.

Input Assure

Output files
  • input/
    • ID-corrected MLST JSON files: sample1.mlst.json.gz

Locidex Merge

Output files
  • merged/
    • Merged MLST query profiles: locidex.merge.profile_query.tsv
    • Merged MLST query and reference profiles: locidex.merge.profile_reference.tsv

Profile Dists

Output files
  • distances/
    • Mapping allele identifiers to integers: allele_map.json. For example:
      {
        "l1": {
          "60b725f10c9c85c70d97880dfe8191b3": 1
        },
        "l2": {
          "60b725f10c9c85c70d97880dfe8191b3": 1
        },
        "l3": {
          "3b5d5c3712955042212316173ccf37be": 1,
          "60b725f10c9c85c70d97880dfe8191b3": 2
        }
      }
    • The query MLST profiles: query_profile.text
    • The reference MLST profiles: ref_profile.text
    • The computed distances based on MLST allele differences: results.text
    • Information on the profile_dists run: run.json

Append Metadata

Output files
  • append/
    • The passed input metadata columns appended to the pairwise distances: distances_and_metadata.tsv

Process Output

Output files
  • process/
    • Pairwise distance results meeting specifications in TSV-format: results.tsv
    • Pairwise distance results meeting specifications in XLSX-format: results.xlsx

IRIDA Next Output

Output files
  • /
    • IRIDA Next-compliant JSON output: iridanext.output.json.gz

Pipeline Information

Output files
  • pipeline_info/
    • Reports generated by Nextflow: execution_report.html, execution_timeline.html, execution_trace.txt and pipeline_dag.dot/pipeline_dag.svg.
    • Reports generated by the pipeline: pipeline_report.html, pipeline_report.txt and software_versions.yml. The pipeline_report* files will only be present if the --email / --email_on_fail parameter's are used when running the pipeline.
    • Reformatted samplesheet files used as input to the pipeline: samplesheet.valid.csv.
    • Parameters used by the pipeline run: params.json.

Nextflow provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage.