Skip to content

Commit

Permalink
Merge pull request #18 from phac-nml/test/encode-missing
Browse files Browse the repository at this point in the history
Test/encode missing
  • Loading branch information
apetkau authored Apr 9, 2024
2 parents 7f4d7fc + bf5acfd commit d4b27a8
Show file tree
Hide file tree
Showing 4 changed files with 144 additions and 0 deletions.
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
# Unreleased

- Added documentation for running test cases.
- Added test cases to verify that missing values in CSV will be encoded as empty strings in IRIDA Next JSON file in the sample metadata section.
- Added test cases for passing missing values in a JSON file.

# 0.2.0 - 2024/01/22

- Added support for writing JSON output file when using `-resume` in a pipeline.
Expand Down
87 changes: 87 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -387,8 +387,81 @@ iridanext {
}
```

### Missing values in metadata

There are two different scenarios where metadata key/value pairs could be missing for a sample, which result in different behaviours in IRIDA Next.

1. **Ignore key**: If the `key` is left out of the samples metadata in the IRIDA Next JSON, then nothing is written for that `key` for the sample. Any existing metadata under that `key` will remain in IRIDA Next.

2. **Delete key**: If a metadata value is an empty string (`"key": ""`) or null (`"key": null`), then IRIDA Next will remove that particular metadata key/value pair from the sample metadata if it exists. This is the expected scenario if pipeline results contain missing (or N/A) values (deleting older metadata keys prevents mixing up old and new pipeline analysis results in the metadata table).

The following are the expectations for writing missing values in the final IRIDA Next JSON file (in order to delete the key/value pairs in IRIDA Next).

#### Encoding missing metadata values using JSON

If the metadata key `b` for **SAMPLE1** is encoded as an empty string `""` or `null` in the JSON file like the below example:

**output.json**
```json
{
"SAMPLE1": {
"a": "value1",
"b": ""
}
}
```

Then the final IRIDA Next JSON file will preserve the empty string/null value in the samples metadata section:

**iridanext.output.json.gz**
```json
"metadata": {
"samples": {
"SAMPLE1": { "a": "value1", "b": "" }
}
}
```

#### Encoding missing metadata values using CSV

If the metadata key `b` for **SAMPLE1** is left empty in the CSV file like the below two examples:

**output.csv** as table
| column1 | b | c |
|--|--|--|
| SAMPLE1 | | 3 |
| SAMPLE2 | 4 | 5 |
| SAMPLE3 | 6 | 7 |

**output.csv** as CSV
```
column1,b,c
SAMPLE1,,3
SAMPLE2,4,5
Sample3,6,7
```

Then the value for `b` for **SAMPLE1** will be written as an empty string in the IRIDA Next JSON file:

**iridanext.output.json.gz**
```json
"metadata": {
"samples": {
"SAMPLE1": { "b": "", "c": "3" },
"SAMPLE2": { "b": "4", "c": "5" },
"SAMPLE3": { "b": "6", "c": "7" }
}
}
```

# Development

In order to build this plugin you will need a Java Development Kit (such as [OpenJDK](https://openjdk.org/)) and [Groovy](https://groovy-lang.org/index.html). For Ubuntu, this can be installed with:

```bash
sudo apt install default-jdk groovy
```

## Build and install from source

In order to build and install the plugin from source, please do the following:
Expand Down Expand Up @@ -421,6 +494,20 @@ plugins {
}
```

## Run unit/integration tests

In order to run the test cases, please clone this repository and run the following command:

```bash
./gradlew check
```

To get more information for any failed tests, please run:

```bash
./gradlew check --info
```

# Example: nf-core/fetchngs

One use case of this plugin is to structure reads and metadata downloaded from NCBI/ENA for storage in IRIDA Next by making use of the [nf-core/fetchngs][nf-core/fetchngs] pipeline. The example configuration [fetchngs.conf][] can be used for this purpose. To test, please run the following (using [ids.csv][fetchngs-ids.csv] as example data accessions):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ import java.nio.file.FileSystems
import nextflow.iridanext.MetadataParser
import nextflow.iridanext.MetadataParserCSV
import spock.lang.Specification
import groovy.lang.MissingPropertyException

import nextflow.iridanext.TestHelper

Expand Down Expand Up @@ -70,4 +71,36 @@ class MetadataParserCSVTest extends Specification {
]
csvMapUnmatch == [:]
}

def 'Test parse CSV file with missing values' () {
when:
def csvContent = """a,b,c
|1,2,
|4,,""".stripMargin()
def csvFile = TestHelper.createInMemTempFile("temp.csv", csvContent)
def parser = new MetadataParserCSV("a", ",")
def csvMapColA = parser.parseMetadata(csvFile)

then:
csvMapColA == [
"1": ["b": "2", "c": ""],
"4": ["b": "", "c": ""]
]
}

def 'Test parse CSV file with missing ids' () {
when:
def csvContent = """a,b,c
|1,2,3
|4,,6""".stripMargin()
def csvFile = TestHelper.createInMemTempFile("temp.csv", csvContent)

parser = new MetadataParserCSV("b", ",")
def csvMapColB = parser.parseMetadata(csvFile)

then:
// the column of identifiers is column "b", which has a missing value
// and so should trigger an exception
thrown(MissingPropertyException)
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -46,4 +46,22 @@ class MetadataParserJSONTest extends Specification {
"2": ["coords": ["x": 0, "y": 1], "coords.x": 4]
]
}

def 'Test parse JSON file missing values' () {
when:
def jsonContent = '''{
"1": {"b": "", "c": "3"},
"2": {"b": "3", "c": null}
}'''.stripMargin()

def jsonFile = TestHelper.createInMemTempFile("temp.json", jsonContent)
def parser = new MetadataParserJSON()
def outputData = parser.parseMetadata(jsonFile)

then:
outputData == [
"1": ["b": "", "c": "3"],
"2": ["b": "3", "c": null]
]
}
}

0 comments on commit d4b27a8

Please sign in to comment.