Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging changes from master into dev and fixing tests #178

Merged
merged 14 commits into from
Feb 3, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# v2.8.0.0
# v2.8.0.0 (in development)
- Major changes to many parts of the pipeline as part of a general performance overhaul
- Modified most processes in the RUN and RUN_VALIDATION workflows to stream data in and out rather than reading whole files
- As part of the previous change, modified most processes in the RUN and RUN_VALIDATION workflows to work with interleaved rather than paired sequence data
Expand All @@ -17,6 +17,15 @@
- Viral hits TSV moved from `virus_hits_db.tsv.gz` to `virus_hits_filtered.tsv.gz`
- Numerous changes to column names in viral hits TSV, mainly to improve clarity

# v2.7.0.2
- Updated `pipeline-version.txt`

# v2.7.0.1
- Fixed index-related issues from v2.7.0.0:
- Updated `EXTRACT_VIRAL_READS` to expect updated path to viral genome DB
- Added `adapters` param to the index config file used to run our tests
- Updated `RUN` and `RUN_VALIDATION` tests to use up-to-date test index (location: `s3://nao-testing/index/20250130`)

# v2.7.0.0
- Implemented masking of viral genome reference in index workflow with MASK_GENOME_FASTA to remove adapter, low-entropy and repeat sequences.
- Removed TRIMMOMATIC and BBMAP from EXTRACT_VIRAL_READS.
Expand Down
8 changes: 2 additions & 6 deletions configs/index-for-run-test.config
Original file line number Diff line number Diff line change
Expand Up @@ -27,14 +27,10 @@ params {
// Other reference files
host_taxon_db = "${projectDir}/ref/host-taxa.tsv"
contaminants = "${projectDir}/ref/contaminants.fasta.gz"
adapters = "${projectDir}/ref/adapters.fasta"
genome_patterns_exclude = "${projectDir}/ref/hv_patterns_exclude.txt"

// Kraken viral DB
kraken_db = "https://genome-idx.s3.amazonaws.com/kraken/k2_viral_20240904.tar.gz"
// Smallest possible BLAST DB
blast_db_name = "nt_others"

// Pull information from GenBank or Ref Seq
ncbi_viral_params = "--section refseq --assembly-level complete"

// Other input values
Expand All @@ -52,4 +48,4 @@ includeConfig "${projectDir}/configs/containers.config"
includeConfig "${projectDir}/configs/resources.config"
includeConfig "${projectDir}/configs/profiles.config"
includeConfig "${projectDir}/configs/output.config"
process.queue = "harmon-queue" // AWS Batch job queue
process.queue = "will-batch-queue" // AWS Batch job queue
2 changes: 1 addition & 1 deletion pipeline-version.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
2.5.2
2.7.0.2
4 changes: 2 additions & 2 deletions subworkflows/local/extractViralReads/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,8 @@ workflow EXTRACT_VIRAL_READS {
bbduk_suffix
bracken_threshold
main:
// 0. Get reference paths
viral_genome_path = "${ref_dir}/results/virus-genomes-filtered.fasta.gz"
// Get reference paths
viral_genome_path = "${ref_dir}/results/virus-genomes-masked.fasta.gz"
genome_meta_path = "${ref_dir}/results/virus-genome-metadata-gid.tsv.gz"
bt2_virus_index_path = "${ref_dir}/results/bt2-virus-index"
bt2_human_index_path = "${ref_dir}/results/bt2-human-index"
Expand Down
Binary file modified test-data/gold-standard-results/bracken_reports_merged.tsv.gz
willbradshaw marked this conversation as resolved.
Show resolved Hide resolved
Binary file not shown.
Binary file modified test-data/gold-standard-results/kraken_reports_merged.tsv.gz
Binary file not shown.
Binary file modified test-data/gold-standard-results/merged_blast_filtered.tsv.gz
Binary file not shown.
Binary file modified test-data/gold-standard-results/read_counts.tsv.gz
Binary file not shown.
Binary file modified test-data/gold-standard-results/subset_qc_adapter_stats.tsv.gz
Binary file not shown.
Binary file modified test-data/gold-standard-results/subset_qc_basic_stats.tsv.gz
Binary file not shown.
Binary file modified test-data/gold-standard-results/subset_qc_length_stats.tsv.gz
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file modified test-data/gold-standard-results/virus_hits_filtered.tsv.gz
Binary file not shown.
4 changes: 2 additions & 2 deletions tests/modules/local/bbduk/bbduk.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ nextflow_process {
process {
'''
input[0] = INTERLEAVE_FASTQ.out.output
input[1] = "${params.ref_dir}/results/virus-genomes-filtered.fasta.gz"
input[1] = "${params.ref_dir}/results/virus-genomes-masked.fasta.gz"
input[2] = "0.4"
input[3] = "27"
input[4] = "ribo"
Expand Down Expand Up @@ -89,7 +89,7 @@ nextflow_process {
process {
'''
input[0] = LOAD_SAMPLESHEET.out.samplesheet
input[1] = "${params.ref_dir}/results/virus-genomes-filtered.fasta.gz"
input[1] = "${params.ref_dir}/results/virus-genomes-masked.fasta.gz"
input[2] = "0.4"
input[3] = "27"
input[4] = "ribo"
Expand Down
2 changes: 1 addition & 1 deletion tests/modules/local/bbduk/bbduk_hits.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ nextflow_process {
process {
'''
input[0] = LOAD_SAMPLESHEET.out.samplesheet
input[1] = "${params.ref_dir}/results/virus-genomes-filtered.fasta.gz"
input[1] = "${params.ref_dir}/results/virus-genomes-masked.fasta.gz"
input[2] = "1"
input[3] = "24"
input[4] = "viral"
Expand Down
2 changes: 1 addition & 1 deletion tests/run.config
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ params {

// Directories
base_dir = "./" // Parent for working and output directories (can be S3)
ref_dir = "s3://nao-testing/index-test/output" // Reference/index directory (generated by index workflow)
ref_dir = "s3://nao-testing/index/20250130/output/" // Reference/index directory (generated by index workflow)

// Files
sample_sheet = "${projectDir}/test-data/samplesheet.csv" // Path to library TSV
Expand Down
2 changes: 1 addition & 1 deletion tests/run_dev_se.config
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ params {

// Directories
base_dir = "./" // Parent for working and output directories (can be S3)
ref_dir = "s3://nao-testing/index-test/output" // Reference/index directory (generated by index workflow)
ref_dir = "s3://nao-testing/index/20250130/output/" // Reference/index directory (generated by index workflow)

// Files
sample_sheet = "${projectDir}/test-data/single-end-samplesheet.csv" // Path to library TSV
Expand Down
2 changes: 1 addition & 1 deletion tests/run_validation.config
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ params {

// Directories
base_dir = "./" // Parent for working and output directories (can be S3)
ref_dir = "s3://nao-testing/index-test/output" // Reference/index directory (generated by index workflow)
ref_dir = "s3://nao-testing/index/20250130/output/" // Reference/index directory (generated by index workflow)

// Files
viral_tsv = "${projectDir}/test-data/gold-standard-results/virus_hits_filtered.tsv.gz"
Expand Down
4 changes: 2 additions & 2 deletions tests/workflows/run.nf.test.snap
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
"content": [
"bracken_reports_merged.tsv.gz:md5,6c504fa837ef97ef2096f2569d8c6902",
"kraken_reports_merged.tsv.gz:md5,84f070b42b948d36ae38eaee4a61982e",
"merged_blast_filtered.tsv.gz:md5,be7002de8c1878da615ba4379b84feab",
"merged_blast_filtered.tsv.gz:md5,b26a764f7b7271256c0d58a89b5517eb",
"read_counts.tsv.gz:md5,8dc2e3ad82f42202262a5e67a9d91e1b",
"subset_qc_adapter_stats.tsv.gz:md5,43a90fc81f11a57e191f10176d3b7caf",
"subset_qc_basic_stats.tsv.gz:md5,98699e1e92085c89771f0a46fa54df0d",
Expand All @@ -16,6 +16,6 @@
"nf-test": "0.9.2",
"nextflow": "24.10.4"
},
"timestamp": "2025-01-30T14:46:04.796716034"
"timestamp": "2025-01-31T16:27:43.310277911"
}
}