Skip to content

Commit

Permalink
Updating documentation.
Browse files Browse the repository at this point in the history
  • Loading branch information
emarinier committed Feb 4, 2025
1 parent 1180a06 commit 307ca6d
Showing 1 changed file with 67 additions and 0 deletions.
67 changes: 67 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -454,6 +454,73 @@ Then the value for `b` for **SAMPLE1** will be written as an empty string in the
}
```

### Manually loading metadata

The default behaviour of the plugin is to automatically observe completed tasks and identify the sample ID of the `meta` object associated with it. This sample ID is recorded and used when producing the metadata section in the `iridanext.output.json.gz` output.

This process will fail to load any sample IDs that were not submitted to a task individually. For example, if a sample is submitted to an assembly task in the following way, then it will be included as a valid sample ID for the whole workflow:

Workflow:
```
// input: tuple(meta, [ file(fastq_1), file(fastq_2) ])
ASSEMBLY (input)
```

Process:
```
process ASSEMBLY {
input:
tuple val(meta), path(reads)
(...)
}
```

However, if samples are only ever submitted to a process in a way similar to as follows, then these sample IDs will not be automatically detected:

Workflow:
```
// input: tuple(meta, [ file(fastq_1), file(fastq_2) ])
metadata = input.map {
meta, reads ->
tuple(meta.id, meta.metadata_1, meta.metadata_2)}
REPORT(metadata.toList())
```

Process:
```
process REPORT {
input:
val metadata
(...)
}
```

In this situation, the solution is to use the provided `loadIridaSampleIds()` method in the workflow as follows:

```
// input: tuple(meta, [ file(fastq_1), file(fastq_2) ])
loaded_input = input.loadIridaSampleIds()
```

Or when using the `nf-validation` plugin, in a manner similar to the following:

```
input = Channel.fromSamplesheet("input")
.loadIridaSampleIds()
```

However, be careful not to use the function in a way that would create a race condition, as the sample IDs may not be loaded correctly:

```
input = Channel.fromSamplesheet("input")
input = input.map { ... } // some input transformation
input.loadIridaSampleIds()
```

Since Nextflow tries to parallelize as much as possible, this will cause `input = input.map { ... }` and `input.loadIridaSampleIds()` to be run simultaneously, which will create a race condition and possibly undesirable behaviour.

# Development

In order to build this plugin you will need a Java Development Kit (such as [OpenJDK](https://openjdk.org/)) and [Groovy](https://groovy-lang.org/index.html). For Ubuntu, this can be installed with:
Expand Down

0 comments on commit 307ca6d

Please sign in to comment.