Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds the Lock Transformation #1

Open
wants to merge 13 commits into
base: dev
Choose a base branch
from
93 changes: 52 additions & 41 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,41 @@
[![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A523.04.3-brightgreen.svg)](https://www.nextflow.io/)

# Example Pipeline for IRIDA Next
# Metadata Transformation Pipeline for IRIDA Next

This is an example pipeline to be used for integration with IRIDA Next.
This pipeline transforms metadata from IRIDA Next.

# Input

The input to the pipeline is a standard sample sheet (passed as `--input samplesheet.csv`) that looks like:
The input to the pipeline is a sample sheet (passed as `--input samplesheet.csv`) that looks like:

| sample | fastq_1 | fastq_2 |
| ------- | --------------- | --------------- |
| SampleA | file_1.fastq.gz | file_2.fastq.gz |
| sample | sample_name | metadata_1 | metadata_2 | metadata_3 | metadata_4 | metadata_5 | metadata_6 | metadata_7 | metadata_8 |
| ------- | ----------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
| Sample1 | SampleA | meta_1 | meta_2 | meta_3 | meta_4 | meta_5 | meta_6 | meta_7 | meta_8 |

The structure of this file is defined in [assets/schema_input.json](assets/schema_input.json). Validation of the sample sheet is performed by [nf-validation](https://nextflow-io.github.io/nf-validation/).

# Parameters

The main parameters are `--input` as defined above and `--output` for specifying the output results directory. You may wish to provide `-profile singularity` to specify the use of singularity containers and `-r [branch]` to specify which GitHub branch you would like to run.

## Transformation

You may specify the metadata transformation with the `--transformation` parameter. For example, `--transformation lock` will perform the lock transformation. The available transformations are as follows:

| Transformation | Explanation |
| -------------- | --------------------------------- |
| lock | Locks the metadata in IRIDA Next. |

## Other Parameters

Other parameters (defaults from nf-core) are defined in [nextflow_schema.json](nextflow_schema.json).

# Running

To run the pipeline, please do:

```bash
nextflow run phac-nml/metadatatransformation -profile singularity -r main -latest --input assets/samplesheet.csv --outdir results
nextflow run phac-nml/metadatatransformation -profile singularity -r main -latest --input assets/samplesheet.csv --outdir results --transformation lock
```

Where the `samplesheet.csv` is structured as specified in the [Input](#input) section.
Expand All @@ -41,64 +51,65 @@ An example of the what the contents of the IRIDA Next JSON file looks like for t
{
"files": {
"global": [
{
"path": "summary/summary.txt.gz"
}

],
"samples": {
"SAMPLE1": [
{
"path": "assembly/SAMPLE1.assembly.fa.gz"
}
],
"SAMPLE2": [
{
"path": "assembly/SAMPLE2.assembly.fa.gz"
}
],
"SAMPLE3": [
{
"path": "assembly/SAMPLE3.assembly.fa.gz"
}
]

}
},
"metadata": {
"samples": {
"SAMPLE1": {
"reads.1": "sample1_R1.fastq.gz",
"reads.2": "sample1_R2.fastq.gz"
"ABC": {
"irida_id": "sample1",
"metadata_1": "1.1",
"metadata_2": "1.2",
"metadata_3": "1.3",
"metadata_4": "1.4",
"metadata_5": "1.5",
"metadata_6": "1.6",
"metadata_7": "1.7",
"metadata_8": "1.8"
},
"SAMPLE2": {
"reads.1": "sample2_R1.fastq.gz",
"reads.2": "sample2_R2.fastq.gz"
"DEF": {
"irida_id": "sample2",
"metadata_1": "2.1",
"metadata_2": "2.2",
"metadata_3": "2.3",
"metadata_4": "2.4",
"metadata_5": "2.5",
"metadata_6": "2.6",
"metadata_7": "2.7",
"metadata_8": "2.8"
},
"SAMPLE3": {
"reads.1": "sample1_R1.fastq.gz",
"reads.2": "null"
"GHI": {
"irida_id": "sample3",
"metadata_1": "3.1",
"metadata_2": "3.2",
"metadata_3": "3.3",
"metadata_4": "3.4",
"metadata_5": "3.5",
"metadata_6": "3.6",
"metadata_7": "3.7",
"metadata_8": "3.8"
}
}
}
}
```

Within the `files` section of this JSON file, all of the output paths are relative to the `outdir`. Therefore, `"path": "assembly/SAMPLE1.assembly.fa.gz"` refers to a file located within `outdir/assembly/SAMPLE1.assembly.fa.gz`.

There is also a pipeline execution summary output file provided (specified in the above JSON as `"global": [{"path":"summary/summary.txt.gz"}]`). However, there is no formatting specification for this file.

For more information see [output doc](docs/output.md)
For more information see [output doc](docs/output.md).

## Test profile

To run with the test profile, please do:

```bash
nextflow run phac-nml/metadatatransformation -profile docker,test -r main -latest --outdir results
nextflow run phac-nml/metadatatransformation -profile docker,test -r main -latest --outdir results --transformation lock
```

# Legal

Copyright 2023 Government of Canada
Copyright 2025 Government of Canada

Licensed under the MIT License (the "License"); you may not use
this work except in compliance with the License. You may obtain a copy of the
Expand Down
8 changes: 4 additions & 4 deletions assets/samplesheet.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
sample,fastq_1,fastq_2
SAMPLE1,https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/illumina/amplicon/sample1_R1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/illumina/amplicon/sample1_R2.fastq.gz
SAMPLE2,https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/illumina/amplicon/sample2_R1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/illumina/amplicon/sample2_R2.fastq.gz
SAMPLE3,https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/illumina/amplicon/sample1_R1.fastq.gz,
sample,sample_name,metadata_1,metadata_2,metadata_3,metadata_4,metadata_5,metadata_6,metadata_7,metadata_8
sample1,"ABC",1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8
sample2,"DEF",2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8
sample3,"GHI",3.1,3.2,3.3,3.4,3.5,3.6,3.7,3.8
87 changes: 65 additions & 22 deletions assets/schema_input.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,29 +10,72 @@
"sample": {
"type": "string",
"pattern": "^\\S+$",
"meta": ["id"],
"meta": ["irida_id"],
"unique": true,
"errorMessage": "Sample name must be provided and cannot contain spaces"
},
"fastq_1": {
"type": "string",
"pattern": "^\\S+\\.f(ast)?q(\\.gz)?$",
"errorMessage": "FastQ file for reads 1 must be provided, cannot contain spaces and must have the extension: '.fq', '.fastq', '.fq.gz' or '.fastq.gz'"
},
"fastq_2": {
"errorMessage": "FastQ file for reads 2 cannot contain spaces and must have the extension: '.fq', '.fastq', '.fq.gz' or '.fastq.gz'",
"anyOf": [
{
"type": "string",
"pattern": "^\\S+\\.f(ast)?q(\\.gz)?$"
},
{
"type": "string",
"maxLength": 0
}
]
"errorMessage": "Sample name must be provided and cannot contain spaces."
},
"sample_name": {
"type": "string",
"meta": ["id"],
"errorMessage": "Sample name is optional, if provided will replace sample for filenames and outputs"
},
"metadata_1": {
"type": "string",
"meta": ["metadata_1"],
"errorMessage": "Metadata associated with the sample (metadata_1).",
"default": "",
"pattern": "^[^\\n\\t\"]+$"
},
"metadata_2": {
"type": "string",
"meta": ["metadata_2"],
"errorMessage": "Metadata associated with the sample (metadata_2).",
"default": "",
"pattern": "^[^\\n\\t\"]+$"
},
"metadata_3": {
"type": "string",
"meta": ["metadata_3"],
"errorMessage": "Metadata associated with the sample (metadata_3).",
"default": "",
"pattern": "^[^\\n\\t\"]+$"
},
"metadata_4": {
"type": "string",
"meta": ["metadata_4"],
"errorMessage": "Metadata associated with the sample (metadata_4).",
"default": "",
"pattern": "^[^\\n\\t\"]+$"
},
"metadata_5": {
"type": "string",
"meta": ["metadata_5"],
"errorMessage": "Metadata associated with the sample (metadata_5).",
"default": "",
"pattern": "^[^\\n\\t\"]+$"
},
"metadata_6": {
"type": "string",
"meta": ["metadata_6"],
"errorMessage": "Metadata associated with the sample (metadata_6).",
"default": "",
"pattern": "^[^\\n\\t\"]+$"
},
"metadata_7": {
"type": "string",
"meta": ["metadata_7"],
"errorMessage": "Metadata associated with the sample (metadata_7).",
"default": "",
"pattern": "^[^\\n\\t\"]+$"
},
"metadata_8": {
"type": "string",
"meta": ["metadata_8"],
"errorMessage": "Metadata associated with the sample (metadata_8).",
"default": "",
"pattern": "^[^\\n\\t\"]+$"
}
},
"required": ["sample", "fastq_1"]
"required": ["sample"]
}
}
}
74 changes: 6 additions & 68 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,87 +4,25 @@

This document describes the output produced by the pipeline.

The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.
The directories listed below may be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory. The exact directories created may depend on which metadata transformation is performed.

- assembly: very small mock assembly files for each sample
- generate: intermediate files used in generating the IRIDA Next JSON output
- pipeline_info: information about the pipeline's execution
- simplify: simplified intermediate files used in generating the IRIDA Next JSON output
- summary: summary report about the pipeline's execution and results
- lock: the outputs of the metadata lock operation

The IRIDA Next-compliant JSON output file will be named `iridanext.output.json.gz` and will be written to the top-level of the results directory. This file is compressed using GZIP and conforms to the [IRIDA Next JSON output specifications](https://github.com/phac-nml/pipeline-standards#42-irida-next-json).

## Pipeline overview

The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:

- [Assembly stub](#assembly-stub) - Performs a stub assembly by generating a mock assembly
- [Generate sample JSON](#generate-sample-json) - Generates a JSON file for each sample
- [Generate summary](#generate-summary) - Generates a summary text file describing the samples and assemblies
- [Simplify IRIDA JSON](#simplify-irida-json) - Simplifies the sample JSONs by limiting nesting depth
- [IRIDA Next Output](#irida-next-output) - Generates a JSON output file that is compliant with IRIDA Next
- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
- [Lock](#lock) - Locks the metadata for IRIDA Next.

### Assembly stub
### Lock

<details markdown="1">
<summary>Output files</summary>

- `assembly/`
- Mock assembly files: `ID.assembly.fa.gz`

</details>

### Generate sample JSON

<details markdown="1">
<summary>Output files</summary>

- `generate/`
- JSON files: `ID.json.gz`

</details>

### Generate summary

<details markdown="1">
<summary>Output files</summary>

- `summary/`
- Text summary describing samples and assemblies: `summary.txt.gz`

</details>

### Simplify IRIDA JSON

<details markdown="1">
<summary>Output files</summary>

- `simplify/`
- Simplified JSON files: `ID.simple.json.gz`

</details>

### IRIDA Next Output

<details markdown="1">
<summary>Output files</summary>

- `/`
- IRIDA Next-compliant JSON output: `iridanext.output.json.gz`

</details>

### Pipeline information

<details markdown="1">
<summary>Output files</summary>

- `pipeline_info/`
- Reports generated by Nextflow: `execution_report.html`, `execution_timeline.html`, `execution_trace.txt` and `pipeline_dag.dot`/`pipeline_dag.svg`.
- Reports generated by the pipeline: `pipeline_report.html`, `pipeline_report.txt` and `software_versions.yml`. The `pipeline_report*` files will only be present if the `--email` / `--email_on_fail` parameter's are used when running the pipeline.
- Reformatted samplesheet files used as input to the pipeline: `samplesheet.valid.csv`.
- Parameters used by the pipeline run: `params.json`.
- `lock/`
- A CSV-format file reporting locked files: `locked.csv`

</details>

Expand Down
Loading
Loading