phac-nml · emarinier · Jan 23, 2025 · Jan 23, 2025 · Jan 23, 2025 · Jan 28, 2025
diff --git a/README.md b/README.md
@@ -1,31 +1,41 @@
 [![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A523.04.3-brightgreen.svg)](https://www.nextflow.io/)
 
-# Example Pipeline for IRIDA Next
+# Metadata Transformation Pipeline for IRIDA Next
 
-This is an example pipeline to be used for integration with IRIDA Next.
+This pipeline transforms metadata from IRIDA Next.
 
 # Input
 
-The input to the pipeline is a standard sample sheet (passed as `--input samplesheet.csv`) that looks like:
+The input to the pipeline is a sample sheet (passed as `--input samplesheet.csv`) that looks like:
 
-| sample  | fastq_1         | fastq_2         |
-| ------- | --------------- | --------------- |
-| SampleA | file_1.fastq.gz | file_2.fastq.gz |
+| sample  | sample_name | metadata_1 | metadata_2 | metadata_3 | metadata_4 | metadata_5 | metadata_6 | metadata_7 | metadata_8 |
+| ------- | ----------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
+| Sample1 | SampleA     | meta_1     | meta_2     | meta_3     | meta_4     | meta_5     | meta_6     | meta_7     | meta_8     |
 
 The structure of this file is defined in [assets/schema_input.json](assets/schema_input.json). Validation of the sample sheet is performed by [nf-validation](https://nextflow-io.github.io/nf-validation/).
 
 # Parameters
 
 The main parameters are `--input` as defined above and `--output` for specifying the output results directory. You may wish to provide `-profile singularity` to specify the use of singularity containers and `-r [branch]` to specify which GitHub branch you would like to run.
 
+## Transformation
+
+You may specify the metadata transformation with the `--transformation` parameter. For example, `--transformation lock` will perform the lock transformation. The available transformations are as follows:
+
+| Transformation | Explanation                       |
+| -------------- | --------------------------------- |
+| lock           | Locks the metadata in IRIDA Next. |
+
+## Other Parameters
+
 Other parameters (defaults from nf-core) are defined in [nextflow_schema.json](nextflow_schema.json).
 
 # Running
 
 To run the pipeline, please do:
 
 ```bash
-nextflow run phac-nml/metadatatransformation -profile singularity -r main -latest --input assets/samplesheet.csv --outdir results
+nextflow run phac-nml/metadatatransformation -profile singularity -r main -latest --input assets/samplesheet.csv --outdir results --transformation lock
 ```
 
 Where the `samplesheet.csv` is structured as specified in the [Input](#input) section.
@@ -41,64 +51,65 @@ An example of the what the contents of the IRIDA Next JSON file looks like for t
 {
     "files": {
         "global": [
-            {
-                "path": "summary/summary.txt.gz"
-            }
+
         ],
         "samples": {
-            "SAMPLE1": [
-                {
-                    "path": "assembly/SAMPLE1.assembly.fa.gz"
-                }
-            ],
-            "SAMPLE2": [
-                {
-                    "path": "assembly/SAMPLE2.assembly.fa.gz"
-                }
-            ],
-            "SAMPLE3": [
-                {
-                    "path": "assembly/SAMPLE3.assembly.fa.gz"
-                }
-            ]
+
         }
     },
     "metadata": {
         "samples": {
-            "SAMPLE1": {
-                "reads.1": "sample1_R1.fastq.gz",
-                "reads.2": "sample1_R2.fastq.gz"
+            "ABC": {
+                "irida_id": "sample1",
+                "metadata_1": "1.1",
+                "metadata_2": "1.2",
+                "metadata_3": "1.3",
+                "metadata_4": "1.4",
+                "metadata_5": "1.5",
+                "metadata_6": "1.6",
+                "metadata_7": "1.7",
+                "metadata_8": "1.8"
             },
-            "SAMPLE2": {
-                "reads.1": "sample2_R1.fastq.gz",
-                "reads.2": "sample2_R2.fastq.gz"
+            "DEF": {
+                "irida_id": "sample2",
+                "metadata_1": "2.1",
+                "metadata_2": "2.2",
+                "metadata_3": "2.3",
+                "metadata_4": "2.4",
+                "metadata_5": "2.5",
+                "metadata_6": "2.6",
+                "metadata_7": "2.7",
+                "metadata_8": "2.8"
             },
-            "SAMPLE3": {
-                "reads.1": "sample1_R1.fastq.gz",
-                "reads.2": "null"
+            "GHI": {
+                "irida_id": "sample3",
+                "metadata_1": "3.1",
+                "metadata_2": "3.2",
+                "metadata_3": "3.3",
+                "metadata_4": "3.4",
+                "metadata_5": "3.5",
+                "metadata_6": "3.6",
+                "metadata_7": "3.7",
+                "metadata_8": "3.8"
             }
         }
     }
 }
 ```
 
-Within the `files` section of this JSON file, all of the output paths are relative to the `outdir`. Therefore, `"path": "assembly/SAMPLE1.assembly.fa.gz"` refers to a file located within `outdir/assembly/SAMPLE1.assembly.fa.gz`.
-
-There is also a pipeline execution summary output file provided (specified in the above JSON as `"global": [{"path":"summary/summary.txt.gz"}]`). However, there is no formatting specification for this file.
-
-For more information see [output doc](docs/output.md)
+For more information see [output doc](docs/output.md).
 
 ## Test profile
 
 To run with the test profile, please do:
 
 ```bash
-nextflow run phac-nml/metadatatransformation -profile docker,test -r main -latest --outdir results
+nextflow run phac-nml/metadatatransformation -profile docker,test -r main -latest --outdir results --transformation lock
 ```
 
 # Legal
 
-Copyright 2023 Government of Canada
+Copyright 2025 Government of Canada
 
 Licensed under the MIT License (the "License"); you may not use
 this work except in compliance with the License. You may obtain a copy of the

diff --git a/assets/samplesheet.csv b/assets/samplesheet.csv
@@ -1,4 +1,4 @@
-sample,fastq_1,fastq_2
-SAMPLE1,https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/illumina/amplicon/sample1_R1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/illumina/amplicon/sample1_R2.fastq.gz
-SAMPLE2,https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/illumina/amplicon/sample2_R1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/illumina/amplicon/sample2_R2.fastq.gz
-SAMPLE3,https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/illumina/amplicon/sample1_R1.fastq.gz,
+sample,sample_name,metadata_1,metadata_2,metadata_3,metadata_4,metadata_5,metadata_6,metadata_7,metadata_8
+sample1,"ABC",1.1,1.2,1.3,1.4,1.5,1.6,1.7,1.8
+sample2,"DEF",2.1,2.2,2.3,2.4,2.5,2.6,2.7,2.8
+sample3,"GHI",3.1,3.2,3.3,3.4,3.5,3.6,3.7,3.8
diff --git a/assets/schema_input.json b/assets/schema_input.json
@@ -10,29 +10,72 @@
             "sample": {
                 "type": "string",
                 "pattern": "^\\S+$",
-                "meta": ["id"],
+                "meta": ["irida_id"],
                 "unique": true,
-                "errorMessage": "Sample name must be provided and cannot contain spaces"
-            },
-            "fastq_1": {
-                "type": "string",
-                "pattern": "^\\S+\\.f(ast)?q(\\.gz)?$",
-                "errorMessage": "FastQ file for reads 1 must be provided, cannot contain spaces and must have the extension: '.fq', '.fastq', '.fq.gz' or '.fastq.gz'"
-            },
-            "fastq_2": {
-                "errorMessage": "FastQ file for reads 2 cannot contain spaces and must have the extension: '.fq', '.fastq', '.fq.gz' or '.fastq.gz'",
-                "anyOf": [
-                    {
-                        "type": "string",
-                        "pattern": "^\\S+\\.f(ast)?q(\\.gz)?$"
-                    },
-                    {
-                        "type": "string",
-                        "maxLength": 0
-                    }
-                ]
+                "errorMessage": "Sample name must be provided and cannot contain spaces."
+            },
+            "sample_name": {
+                "type": "string",
+                "meta": ["id"],
+                "errorMessage": "Sample name is optional, if provided will replace sample for filenames and outputs"
+            },
+            "metadata_1": {
+                "type": "string",
+                "meta": ["metadata_1"],
+                "errorMessage": "Metadata associated with the sample (metadata_1).",
+                "default": "",
+                "pattern": "^[^\\n\\t\"]+$"
+            },
+            "metadata_2": {
+                "type": "string",
+                "meta": ["metadata_2"],
+                "errorMessage": "Metadata associated with the sample (metadata_2).",
+                "default": "",
+                "pattern": "^[^\\n\\t\"]+$"
+            },
+            "metadata_3": {
+                "type": "string",
+                "meta": ["metadata_3"],
+                "errorMessage": "Metadata associated with the sample (metadata_3).",
+                "default": "",
+                "pattern": "^[^\\n\\t\"]+$"
+            },
+            "metadata_4": {
+                "type": "string",
+                "meta": ["metadata_4"],
+                "errorMessage": "Metadata associated with the sample (metadata_4).",
+                "default": "",
+                "pattern": "^[^\\n\\t\"]+$"
+            },
+            "metadata_5": {
+                "type": "string",
+                "meta": ["metadata_5"],
+                "errorMessage": "Metadata associated with the sample (metadata_5).",
+                "default": "",
+                "pattern": "^[^\\n\\t\"]+$"
+            },
+            "metadata_6": {
+                "type": "string",
+                "meta": ["metadata_6"],
+                "errorMessage": "Metadata associated with the sample (metadata_6).",
+                "default": "",
+                "pattern": "^[^\\n\\t\"]+$"
+            },
+            "metadata_7": {
+                "type": "string",
+                "meta": ["metadata_7"],
+                "errorMessage": "Metadata associated with the sample (metadata_7).",
+                "default": "",
+                "pattern": "^[^\\n\\t\"]+$"
+            },
+            "metadata_8": {
+                "type": "string",
+                "meta": ["metadata_8"],
+                "errorMessage": "Metadata associated with the sample (metadata_8).",
+                "default": "",
+                "pattern": "^[^\\n\\t\"]+$"
             }
         },
-        "required": ["sample", "fastq_1"]
+        "required": ["sample"]
     }
-}
+}
diff --git a/docs/output.md b/docs/output.md
@@ -4,87 +4,25 @@
 
 This document describes the output produced by the pipeline.
 
-The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.
+The directories listed below may be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory. The exact directories created may depend on which metadata transformation is performed.
 
-- assembly: very small mock assembly files for each sample
-- generate: intermediate files used in generating the IRIDA Next JSON output
-- pipeline_info: information about the pipeline's execution
-- simplify: simplified intermediate files used in generating the IRIDA Next JSON output
-- summary: summary report about the pipeline's execution and results
+- lock: the outputs of the metadata lock operation
 
 The IRIDA Next-compliant JSON output file will be named `iridanext.output.json.gz` and will be written to the top-level of the results directory. This file is compressed using GZIP and conforms to the [IRIDA Next JSON output specifications](https://github.com/phac-nml/pipeline-standards#42-irida-next-json).
 
 ## Pipeline overview
 
 The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:
 
-- [Assembly stub](#assembly-stub) - Performs a stub assembly by generating a mock assembly
-- [Generate sample JSON](#generate-sample-json) - Generates a JSON file for each sample
-- [Generate summary](#generate-summary) - Generates a summary text file describing the samples and assemblies
-- [Simplify IRIDA JSON](#simplify-irida-json) - Simplifies the sample JSONs by limiting nesting depth
-- [IRIDA Next Output](#irida-next-output) - Generates a JSON output file that is compliant with IRIDA Next
-- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
+- [Lock](#lock) - Locks the metadata for IRIDA Next.
 
-### Assembly stub
+### Lock
 
 <details markdown="1">
 <summary>Output files</summary>
 
-- `assembly/`
-  - Mock assembly files: `ID.assembly.fa.gz`
-
-</details>
-
-### Generate sample JSON
-
-<details markdown="1">
-<summary>Output files</summary>
-
-- `generate/`
-  - JSON files: `ID.json.gz`
-
-</details>
-
-### Generate summary
-
-<details markdown="1">
-<summary>Output files</summary>
-
-- `summary/`
-  - Text summary describing samples and assemblies: `summary.txt.gz`
-
-</details>
-
-### Simplify IRIDA JSON
-
-<details markdown="1">
-<summary>Output files</summary>
-
-- `simplify/`
-  - Simplified JSON files: `ID.simple.json.gz`
-
-</details>
-
-### IRIDA Next Output
-
-<details markdown="1">
-<summary>Output files</summary>
-
-- `/`
-  - IRIDA Next-compliant JSON output: `iridanext.output.json.gz`
-
-</details>
-
-### Pipeline information
-
-<details markdown="1">
-<summary>Output files</summary>
-
-- `pipeline_info/`
-  - Reports generated by Nextflow: `execution_report.html`, `execution_timeline.html`, `execution_trace.txt` and `pipeline_dag.dot`/`pipeline_dag.svg`.
-  - Reports generated by the pipeline: `pipeline_report.html`, `pipeline_report.txt` and `software_versions.yml`. The `pipeline_report*` files will only be present if the `--email` / `--email_on_fail` parameter's are used when running the pipeline.
-  - Reformatted samplesheet files used as input to the pipeline: `samplesheet.valid.csv`.
-  - Parameters used by the pipeline run: `params.json`.
+- `lock/`
+  - A CSV-format file reporting locked files: `locked.csv`
 
 </details>