Skip to content

Commit

Permalink
Merge pull request #178 from naobservatory/will-merge-master
Browse files Browse the repository at this point in the history
Merging changes from master into dev and fixing tests
  • Loading branch information
willbradshaw authored Feb 3, 2025
2 parents a322930 + db459fe commit a97711a
Show file tree
Hide file tree
Showing 20 changed files with 23 additions and 18 deletions.
11 changes: 10 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# v2.8.0.0
# v2.8.0.0 (in development)
- Major changes to many parts of the pipeline as part of a general performance overhaul
- Modified most processes in the RUN and RUN_VALIDATION workflows to stream data in and out rather than reading whole files
- As part of the previous change, modified most processes in the RUN and RUN_VALIDATION workflows to work with interleaved rather than paired sequence data
Expand All @@ -17,6 +17,15 @@
- Viral hits TSV moved from `virus_hits_db.tsv.gz` to `virus_hits_filtered.tsv.gz`
- Numerous changes to column names in viral hits TSV, mainly to improve clarity

# v2.7.0.2
- Updated `pipeline-version.txt`

# v2.7.0.1
- Fixed index-related issues from v2.7.0.0:
- Updated `EXTRACT_VIRAL_READS` to expect updated path to viral genome DB
- Added `adapters` param to the index config file used to run our tests
- Updated `RUN` and `RUN_VALIDATION` tests to use up-to-date test index (location: `s3://nao-testing/index/20250130`)

# v2.7.0.0
- Implemented masking of viral genome reference in index workflow with MASK_GENOME_FASTA to remove adapter, low-entropy and repeat sequences.
- Removed TRIMMOMATIC and BBMAP from EXTRACT_VIRAL_READS.
Expand Down
8 changes: 2 additions & 6 deletions configs/index-for-run-test.config
Original file line number Diff line number Diff line change
Expand Up @@ -27,14 +27,10 @@ params {
// Other reference files
host_taxon_db = "${projectDir}/ref/host-taxa.tsv"
contaminants = "${projectDir}/ref/contaminants.fasta.gz"
adapters = "${projectDir}/ref/adapters.fasta"
genome_patterns_exclude = "${projectDir}/ref/hv_patterns_exclude.txt"

// Kraken viral DB
kraken_db = "https://genome-idx.s3.amazonaws.com/kraken/k2_viral_20240904.tar.gz"
// Smallest possible BLAST DB
blast_db_name = "nt_others"

// Pull information from GenBank or Ref Seq
ncbi_viral_params = "--section refseq --assembly-level complete"

// Other input values
Expand All @@ -52,4 +48,4 @@ includeConfig "${projectDir}/configs/containers.config"
includeConfig "${projectDir}/configs/resources.config"
includeConfig "${projectDir}/configs/profiles.config"
includeConfig "${projectDir}/configs/output.config"
process.queue = "harmon-queue" // AWS Batch job queue
process.queue = "will-batch-queue" // AWS Batch job queue
2 changes: 1 addition & 1 deletion pipeline-version.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
2.5.2
2.7.0.2
4 changes: 2 additions & 2 deletions subworkflows/local/extractViralReads/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,8 @@ workflow EXTRACT_VIRAL_READS {
bbduk_suffix
bracken_threshold
main:
// 0. Get reference paths
viral_genome_path = "${ref_dir}/results/virus-genomes-filtered.fasta.gz"
// Get reference paths
viral_genome_path = "${ref_dir}/results/virus-genomes-masked.fasta.gz"
genome_meta_path = "${ref_dir}/results/virus-genome-metadata-gid.tsv.gz"
bt2_virus_index_path = "${ref_dir}/results/bt2-virus-index"
bt2_human_index_path = "${ref_dir}/results/bt2-human-index"
Expand Down
Binary file modified test-data/gold-standard-results/bracken_reports_merged.tsv.gz
Binary file not shown.
Binary file modified test-data/gold-standard-results/kraken_reports_merged.tsv.gz
Binary file not shown.
Binary file modified test-data/gold-standard-results/merged_blast_filtered.tsv.gz
Binary file not shown.
Binary file modified test-data/gold-standard-results/read_counts.tsv.gz
Binary file not shown.
Binary file modified test-data/gold-standard-results/subset_qc_adapter_stats.tsv.gz
Binary file not shown.
Binary file modified test-data/gold-standard-results/subset_qc_basic_stats.tsv.gz
Binary file not shown.
Binary file modified test-data/gold-standard-results/subset_qc_length_stats.tsv.gz
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file modified test-data/gold-standard-results/virus_hits_filtered.tsv.gz
Binary file not shown.
4 changes: 2 additions & 2 deletions tests/modules/local/bbduk/bbduk.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ nextflow_process {
process {
'''
input[0] = INTERLEAVE_FASTQ.out.output
input[1] = "${params.ref_dir}/results/virus-genomes-filtered.fasta.gz"
input[1] = "${params.ref_dir}/results/virus-genomes-masked.fasta.gz"
input[2] = "0.4"
input[3] = "27"
input[4] = "ribo"
Expand Down Expand Up @@ -89,7 +89,7 @@ nextflow_process {
process {
'''
input[0] = LOAD_SAMPLESHEET.out.samplesheet
input[1] = "${params.ref_dir}/results/virus-genomes-filtered.fasta.gz"
input[1] = "${params.ref_dir}/results/virus-genomes-masked.fasta.gz"
input[2] = "0.4"
input[3] = "27"
input[4] = "ribo"
Expand Down
2 changes: 1 addition & 1 deletion tests/modules/local/bbduk/bbduk_hits.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ nextflow_process {
process {
'''
input[0] = LOAD_SAMPLESHEET.out.samplesheet
input[1] = "${params.ref_dir}/results/virus-genomes-filtered.fasta.gz"
input[1] = "${params.ref_dir}/results/virus-genomes-masked.fasta.gz"
input[2] = "1"
input[3] = "24"
input[4] = "viral"
Expand Down
2 changes: 1 addition & 1 deletion tests/run.config
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ params {

// Directories
base_dir = "./" // Parent for working and output directories (can be S3)
ref_dir = "s3://nao-testing/index-test/output" // Reference/index directory (generated by index workflow)
ref_dir = "s3://nao-testing/index/20250130/output/" // Reference/index directory (generated by index workflow)

// Files
sample_sheet = "${projectDir}/test-data/samplesheet.csv" // Path to library TSV
Expand Down
2 changes: 1 addition & 1 deletion tests/run_dev_se.config
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ params {

// Directories
base_dir = "./" // Parent for working and output directories (can be S3)
ref_dir = "s3://nao-testing/index-test/output" // Reference/index directory (generated by index workflow)
ref_dir = "s3://nao-testing/index/20250130/output/" // Reference/index directory (generated by index workflow)

// Files
sample_sheet = "${projectDir}/test-data/single-end-samplesheet.csv" // Path to library TSV
Expand Down
2 changes: 1 addition & 1 deletion tests/run_validation.config
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ params {

// Directories
base_dir = "./" // Parent for working and output directories (can be S3)
ref_dir = "s3://nao-testing/index-test/output" // Reference/index directory (generated by index workflow)
ref_dir = "s3://nao-testing/index/20250130/output/" // Reference/index directory (generated by index workflow)

// Files
viral_tsv = "${projectDir}/test-data/gold-standard-results/virus_hits_filtered.tsv.gz"
Expand Down
4 changes: 2 additions & 2 deletions tests/workflows/run.nf.test.snap
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
"content": [
"bracken_reports_merged.tsv.gz:md5,6c504fa837ef97ef2096f2569d8c6902",
"kraken_reports_merged.tsv.gz:md5,84f070b42b948d36ae38eaee4a61982e",
"merged_blast_filtered.tsv.gz:md5,be7002de8c1878da615ba4379b84feab",
"merged_blast_filtered.tsv.gz:md5,b26a764f7b7271256c0d58a89b5517eb",
"read_counts.tsv.gz:md5,8dc2e3ad82f42202262a5e67a9d91e1b",
"subset_qc_adapter_stats.tsv.gz:md5,43a90fc81f11a57e191f10176d3b7caf",
"subset_qc_basic_stats.tsv.gz:md5,98699e1e92085c89771f0a46fa54df0d",
Expand All @@ -16,6 +16,6 @@
"nf-test": "0.9.2",
"nextflow": "24.10.4"
},
"timestamp": "2025-01-30T14:46:04.796716034"
"timestamp": "2025-01-31T16:27:43.310277911"
}
}

0 comments on commit a97711a

Please sign in to comment.