From 480b16a245279b17556042a34b946881228147e4 Mon Sep 17 00:00:00 2001 From: xonq Date: Fri, 17 Jan 2025 19:23:45 +0000 Subject: [PATCH 01/27] initialize README.md update with detailed workflow inputs and outputs for MycoSNP-WDL; Terra task delineation needs to be updated --- README.md | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 72 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index bd2c0d4..fd47142 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,73 @@ # mycosnp-wdl -A WDL wrapper of [CDCGov/mycosnp-nf for](https://github.com/CDCgov/mycosnp-nf) Terra.bio + +## Quick Facts + +| **Workflow Type** | **Applicable Kingdom** | **Last Known Changes** | **Command-line Compatibility** | **Workflow Level** | +|---|---|---|---|---| +| MycoSNP-WDL | Fungi | mycosnp-wdl v1.5 | Yes | Sample-level, Set-level | + +## MycoSNP-WDL +WDL wrappers of [CDCGov/mycosnp-nf for](https://github.com/CDCgov/mycosnp-nf) for Terra.bio for *Candiozyma auris* variant calling and single nucleotide polymorphism (SNP) phylogenetic tree reconstruction. + +### wf_mycosnp_tree.wdl + +#### Inputs + +
+ +| **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** | +|---|---|---|---|---|---| +| mycosnptree_nf | **vcf** | Array[File] | VCF files for analysis | | Required | +| mycosnptree_nf | **vcf_index** | Array[File] | Index files for the VCF files | | Required | + +
+ +#### Outputs + +
+ +| **Variable** | **Type** | **Description** | +|---|---|---| +| mycosnp_tree_version | String | Version of the MycoSNP-WDL workflow | +| mycosnp_tree_analysis_date | String | Date of the analysis | +| mycosnp_version | String | Version of MycoSNP used | +| mycosnp_docker | String | Docker image used for MycoSNP | +| analysis_date | String | Date of the analysis | +| reference_strain | String | Reference strain used | +| reference_accession | String | Accession number of the reference strain | +| mycosnp_rapidnj_tree | File | RapidNJ tree file | +| mycosnp_fastree_tree | File | FastTree tree file | +| mycosnp_iqtree_tree | File | IQ-TREE tree file | +| mycosnp_alignment | File | Alignment file | +| mycosnptree_snpdists | File | SNP distances file | +| mycosnp_tree_full_results | File | Full results file | +| mycosnp_tree_vcf_csv | File | VCF to CSV file | + +
+ +### wf_mycosnp_variants.wdl + +#### Inputs + +
+ +| **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** | +|---|---|---|---|---|---| +| mycosnpvariants_nf | **reads** | Array[File] | Input reads for variant calling | | Required | +| mycosnpvariants_nf | **reference** | File | Reference genome file | | Required | + +
+ +#### Outputs + +
+ +| **Variable** | **Type** | **Description** | +|---|---|---| +| mycosnp_variants_vcf | File | VCF file with called variants | +| mycosnp_variants_vcf_index | File | Index file for the VCF | +| mycosnp_variants_bam | File | BAM file with aligned reads | +| mycosnp_variants_bam_index | File | Index file for the BAM | +| mycosnp_variants_stats | File | Statistics file for the variant calling | + +
\ No newline at end of file From 6a8809c4b4b08aea33ed4a290602952735be24dc Mon Sep 17 00:00:00 2001 From: xonq Date: Thu, 23 Jan 2025 17:48:59 +0000 Subject: [PATCH 02/27] Update README.md to reflect changes in WDL workflows and inputs for MycoSNP --- README.md | 57 ++++++++++++++++++++++++++++++------------------------- 1 file changed, 31 insertions(+), 26 deletions(-) diff --git a/README.md b/README.md index fd47142..ca31d6d 100644 --- a/README.md +++ b/README.md @@ -7,9 +7,9 @@ | MycoSNP-WDL | Fungi | mycosnp-wdl v1.5 | Yes | Sample-level, Set-level | ## MycoSNP-WDL -WDL wrappers of [CDCGov/mycosnp-nf for](https://github.com/CDCgov/mycosnp-nf) for Terra.bio for *Candiozyma auris* variant calling and single nucleotide polymorphism (SNP) phylogenetic tree reconstruction. +WDL wrappers of [CDCGov/mycosnp-nf](https://github.com/CDCgov/mycosnp-nf) for Terra.bio for *Candiozyma (Candida) auris* variant calling and single nucleotide polymorphism (SNP) phylogenetic tree reconstruction. -### wf_mycosnp_tree.wdl +### wf_mycosnp_variants.wdl #### Inputs @@ -17,8 +17,11 @@ WDL wrappers of [CDCGov/mycosnp-nf for](https://github.com/CDCgov/mycosnp-nf) fo | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** | |---|---|---|---|---|---| -| mycosnptree_nf | **vcf** | Array[File] | VCF files for analysis | | Required | -| mycosnptree_nf | **vcf_index** | Array[File] | Index files for the VCF files | | Required | +| wf_mycosnp_variants | **read1** | File | Illumina forward read file in FASTQ format (compression optional) | | Required | +| wf_mycosnp_variants | **read2** | File | Illumina reverse read file in FASTQ format (compression optional) | | Required | +| wf_mycosnp_variants | **samplename** | String | Name of sample to be analyzed | | Required | +| wf_mycosnp_variants | **strain** | String | Name of reference strain | "B11205" | Optional | +| wf_mycosnp_variants | **accession** | String | Accession number of reference strain | "GCA_016772135" | Optional | @@ -28,24 +31,15 @@ WDL wrappers of [CDCGov/mycosnp-nf for](https://github.com/CDCgov/mycosnp-nf) fo | **Variable** | **Type** | **Description** | |---|---|---| -| mycosnp_tree_version | String | Version of the MycoSNP-WDL workflow | -| mycosnp_tree_analysis_date | String | Date of the analysis | -| mycosnp_version | String | Version of MycoSNP used | -| mycosnp_docker | String | Docker image used for MycoSNP | -| analysis_date | String | Date of the analysis | -| reference_strain | String | Reference strain used | -| reference_accession | String | Accession number of the reference strain | -| mycosnp_rapidnj_tree | File | RapidNJ tree file | -| mycosnp_fastree_tree | File | FastTree tree file | -| mycosnp_iqtree_tree | File | IQ-TREE tree file | -| mycosnp_alignment | File | Alignment file | -| mycosnptree_snpdists | File | SNP distances file | -| mycosnp_tree_full_results | File | Full results file | -| mycosnp_tree_vcf_csv | File | VCF to CSV file | +| mycosnp_variants_vcf | File | VCF file with called variants | +| mycosnp_variants_vcf_index | File | Index file for the VCF | +| mycosnp_variants_bam | File | BAM file with aligned reads | +| mycosnp_variants_bam_index | File | Index file for the BAM | +| mycosnp_variants_stats | File | Statistics file for the variant calling | -### wf_mycosnp_variants.wdl +### wf_mycosnp_tree.wdl #### Inputs @@ -53,8 +47,10 @@ WDL wrappers of [CDCGov/mycosnp-nf for](https://github.com/CDCgov/mycosnp-nf) fo | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** | |---|---|---|---|---|---| -| mycosnpvariants_nf | **reads** | Array[File] | Input reads for variant calling | | Required | -| mycosnpvariants_nf | **reference** | File | Reference genome file | | Required | +| wf_mycosnp_tree | **vcf** | Array[File] | VCF files for analysis | | Required | +| wf_mycosnp_tree | **vcf_index** | Array[File] | Index files for the VCF files | | Required | +| wf_mycosnp_tree | **strain** | String | Name of reference strain | "B11205" | Optional | +| wf_mycosnp_tree | **accession** | String | Accession number of reference strain | "GCA_016772135" | Optional | @@ -64,10 +60,19 @@ WDL wrappers of [CDCGov/mycosnp-nf for](https://github.com/CDCgov/mycosnp-nf) fo | **Variable** | **Type** | **Description** | |---|---|---| -| mycosnp_variants_vcf | File | VCF file with called variants | -| mycosnp_variants_vcf_index | File | Index file for the VCF | -| mycosnp_variants_bam | File | BAM file with aligned reads | -| mycosnp_variants_bam_index | File | Index file for the BAM | -| mycosnp_variants_stats | File | Statistics file for the variant calling | +| mycosnp_tree_version | String | Version of the MycoSNP-WDL workflow | +| mycosnp_tree_analysis_date | String | Date of the analysis | +| mycosnp_version | String | Version of [MycoSNP-nf](https://github.com/CDCgov/mycosnp-nf/tree/master) used | +| mycosnp_docker | String | Docker image used for MycoSNP | +| analysis_date | String | Date of the analysis | +| reference_strain | String | Reference strain used | +| reference_accession | String | Accession number of the reference strain | +| mycosnp_rapidnj_tree | File | RapidNJ tree file | +| mycosnp_fastree_tree | File | FastTree tree file | +| mycosnp_iqtree_tree | File | IQ-TREE tree file | +| mycosnp_alignment | File | Alignment file | +| mycosnptree_snpdists | File | SNP distances file | +| mycosnp_tree_full_results | File | Full results file | +| mycosnp_tree_vcf_csv | File | VCF to CSV file | \ No newline at end of file From a29cea5d054100b8ca09d160b088b932c4e74113 Mon Sep 17 00:00:00 2001 From: xonq Date: Thu, 23 Jan 2025 17:51:55 +0000 Subject: [PATCH 03/27] Update README.md title to MycoSNP-WDL Workflow Series --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index ca31d6d..6839d0a 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# mycosnp-wdl +# MycoSNP-WDL Workflow Series ## Quick Facts From 5bc57e9894a1619b548e5502ded90cd5be7c2b3d Mon Sep 17 00:00:00 2001 From: xonq Date: Thu, 23 Jan 2025 18:06:34 +0000 Subject: [PATCH 04/27] remove explicit Terra mention --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 6839d0a..9568966 100644 --- a/README.md +++ b/README.md @@ -7,7 +7,7 @@ | MycoSNP-WDL | Fungi | mycosnp-wdl v1.5 | Yes | Sample-level, Set-level | ## MycoSNP-WDL -WDL wrappers of [CDCGov/mycosnp-nf](https://github.com/CDCgov/mycosnp-nf) for Terra.bio for *Candiozyma (Candida) auris* variant calling and single nucleotide polymorphism (SNP) phylogenetic tree reconstruction. +WDL wrappers of [CDCGov/mycosnp-nf](https://github.com/CDCgov/mycosnp-nf) for *Candiozyma (Candida) auris* variant calling and single nucleotide polymorphism (SNP) phylogenetic tree reconstruction. ### wf_mycosnp_variants.wdl From c40b02c4f0cc33fb833c3b9f17e9c0a837d65874 Mon Sep 17 00:00:00 2001 From: xonq Date: Fri, 31 Jan 2025 16:43:09 +0000 Subject: [PATCH 05/27] change out of searchable table --- README.md | 18 +----------------- 1 file changed, 1 insertion(+), 17 deletions(-) diff --git a/README.md b/README.md index 9568966..3adfd0a 100644 --- a/README.md +++ b/README.md @@ -13,8 +13,6 @@ WDL wrappers of [CDCGov/mycosnp-nf](https://github.com/CDCgov/mycosnp-nf) for *C #### Inputs -
- | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** | |---|---|---|---|---|---| | wf_mycosnp_variants | **read1** | File | Illumina forward read file in FASTQ format (compression optional) | | Required | @@ -23,12 +21,8 @@ WDL wrappers of [CDCGov/mycosnp-nf](https://github.com/CDCgov/mycosnp-nf) for *C | wf_mycosnp_variants | **strain** | String | Name of reference strain | "B11205" | Optional | | wf_mycosnp_variants | **accession** | String | Accession number of reference strain | "GCA_016772135" | Optional | -
- #### Outputs -
- | **Variable** | **Type** | **Description** | |---|---|---| | mycosnp_variants_vcf | File | VCF file with called variants | @@ -37,14 +31,10 @@ WDL wrappers of [CDCGov/mycosnp-nf](https://github.com/CDCgov/mycosnp-nf) for *C | mycosnp_variants_bam_index | File | Index file for the BAM | | mycosnp_variants_stats | File | Statistics file for the variant calling | -
- ### wf_mycosnp_tree.wdl #### Inputs -
- | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** | |---|---|---|---|---|---| | wf_mycosnp_tree | **vcf** | Array[File] | VCF files for analysis | | Required | @@ -52,12 +42,8 @@ WDL wrappers of [CDCGov/mycosnp-nf](https://github.com/CDCgov/mycosnp-nf) for *C | wf_mycosnp_tree | **strain** | String | Name of reference strain | "B11205" | Optional | | wf_mycosnp_tree | **accession** | String | Accession number of reference strain | "GCA_016772135" | Optional | -
- #### Outputs -
- | **Variable** | **Type** | **Description** | |---|---|---| | mycosnp_tree_version | String | Version of the MycoSNP-WDL workflow | @@ -73,6 +59,4 @@ WDL wrappers of [CDCGov/mycosnp-nf](https://github.com/CDCgov/mycosnp-nf) for *C | mycosnp_alignment | File | Alignment file | | mycosnptree_snpdists | File | SNP distances file | | mycosnp_tree_full_results | File | Full results file | -| mycosnp_tree_vcf_csv | File | VCF to CSV file | - -
\ No newline at end of file +| mycosnp_tree_vcf_csv | File | VCF to CSV file | \ No newline at end of file From 5cf3b46cbf09c20daf25ed26f1f8f792b7cf8343 Mon Sep 17 00:00:00 2001 From: xonq Date: Fri, 31 Jan 2025 17:28:34 +0000 Subject: [PATCH 06/27] update table I/O to correspond with PR 7 --- README.md | 42 ++++++++++++++++++++++++++++++++++++------ 1 file changed, 36 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 3adfd0a..57342bf 100644 --- a/README.md +++ b/README.md @@ -18,8 +18,8 @@ WDL wrappers of [CDCGov/mycosnp-nf](https://github.com/CDCgov/mycosnp-nf) for *C | wf_mycosnp_variants | **read1** | File | Illumina forward read file in FASTQ format (compression optional) | | Required | | wf_mycosnp_variants | **read2** | File | Illumina reverse read file in FASTQ format (compression optional) | | Required | | wf_mycosnp_variants | **samplename** | String | Name of sample to be analyzed | | Required | -| wf_mycosnp_variants | **strain** | String | Name of reference strain | "B11205" | Optional | -| wf_mycosnp_variants | **accession** | String | Accession number of reference strain | "GCA_016772135" | Optional | +| wf_mycosnp_variants | **ref_tar** | File | Reference tar file | | Optional | +| wf_mycosnp_variants | **fasta** | File | Reference FASTA file | | Optional | #### Outputs @@ -30,6 +30,38 @@ WDL wrappers of [CDCGov/mycosnp-nf](https://github.com/CDCgov/mycosnp-nf) for *C | mycosnp_variants_bam | File | BAM file with aligned reads | | mycosnp_variants_bam_index | File | Index file for the BAM | | mycosnp_variants_stats | File | Statistics file for the variant calling | +| mycosnp_variants_version | String | Version of the MycoSNP variants | +| mycosnp_variants_analysis_date | String | Date of the MycoSNP variants analysis | +| mycosnp_version | String | Version of MycoSNP | +| mycosnp_docker | String | Docker image used for MycoSNP | +| analysis_date | String | Date of the analysis | +| reference_strain | String | Reference strain used | +| reference_name | String | Name of the reference | +| reads_before_trimming | Int | Number of reads before trimming | +| gc_before_trimming | Float | GC content before trimming | +| average_q_score_before_trimming | Float | Average quality score before trimming | +| reference_length_coverage_before_trimming | Float | Reference length coverage before trimming | +| reads_after_trimming | Int | Number of reads after trimming | +| reads_after_trimming_percent | String | Percentage of reads after trimming | +| paired_reads_after_trimming | Int | Number of paired reads after trimming | +| paired_reads_after_trimming_percent | String | Percentage of paired reads after trimming | +| unpaired_reads_after_trimming | Int | Number of unpaired reads after trimming | +| unpaired_reads_after_trimming_percent | String | Percentage of unpaired reads after trimming | +| gc_after_trimming | Float | GC content after trimming | +| average_q_score_after_trimming | Float | Average quality score after trimming | +| reference_length_coverage_after_trimming | Float | Reference length coverage after trimming | +| mean_coverage_depth | Float | Mean coverage depth | +| reads_mapped | Int | Number of reads mapped | +| number_n | Int | Number of N bases | +| percent_reference_coverage | Float | Percentage of reference coverage | +| assembly_size | Int | Size of the assembly | +| consensus_n_variant_min_depth | Int | Minimum depth for consensus N variant | +| vcf | File | VCF file | +| vcf_index | File | Index file for the VCF | +| multiqc | File | MultiQC report | +| myco_bam | File | BAM file | +| myco_bam_bai | File | BAM index file | +| full_results | File | Full results file | ### wf_mycosnp_tree.wdl @@ -39,8 +71,7 @@ WDL wrappers of [CDCGov/mycosnp-nf](https://github.com/CDCgov/mycosnp-nf) for *C |---|---|---|---|---|---| | wf_mycosnp_tree | **vcf** | Array[File] | VCF files for analysis | | Required | | wf_mycosnp_tree | **vcf_index** | Array[File] | Index files for the VCF files | | Required | -| wf_mycosnp_tree | **strain** | String | Name of reference strain | "B11205" | Optional | -| wf_mycosnp_tree | **accession** | String | Accession number of reference strain | "GCA_016772135" | Optional | +| wf_mycosnp_tree | **fasta** | File | Reference FASTA input | Optional | #### Outputs @@ -50,9 +81,8 @@ WDL wrappers of [CDCGov/mycosnp-nf](https://github.com/CDCgov/mycosnp-nf) for *C | mycosnp_tree_analysis_date | String | Date of the analysis | | mycosnp_version | String | Version of [MycoSNP-nf](https://github.com/CDCgov/mycosnp-nf/tree/master) used | | mycosnp_docker | String | Docker image used for MycoSNP | -| analysis_date | String | Date of the analysis | | reference_strain | String | Reference strain used | -| reference_accession | String | Accession number of the reference strain | +| reference_name | String | Accession number of the reference strain | | mycosnp_rapidnj_tree | File | RapidNJ tree file | | mycosnp_fastree_tree | File | FastTree tree file | | mycosnp_iqtree_tree | File | IQ-TREE tree file | From 035a6fd34b4327320ba49375ad3cd250dab144f6 Mon Sep 17 00:00:00 2001 From: xonq Date: Fri, 31 Jan 2025 17:29:54 +0000 Subject: [PATCH 07/27] formatting --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 57342bf..ba8ad94 100644 --- a/README.md +++ b/README.md @@ -71,7 +71,7 @@ WDL wrappers of [CDCGov/mycosnp-nf](https://github.com/CDCgov/mycosnp-nf) for *C |---|---|---|---|---|---| | wf_mycosnp_tree | **vcf** | Array[File] | VCF files for analysis | | Required | | wf_mycosnp_tree | **vcf_index** | Array[File] | Index files for the VCF files | | Required | -| wf_mycosnp_tree | **fasta** | File | Reference FASTA input | Optional | +| wf_mycosnp_tree | **fasta** | File | Reference FASTA input | | Optional | #### Outputs From f43ae3fea68a15faae6c9af8f740b7e90e454578 Mon Sep 17 00:00:00 2001 From: xonq Date: Fri, 31 Jan 2025 17:33:25 +0000 Subject: [PATCH 08/27] add internal links --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index ba8ad94..307d998 100644 --- a/README.md +++ b/README.md @@ -7,7 +7,7 @@ | MycoSNP-WDL | Fungi | mycosnp-wdl v1.5 | Yes | Sample-level, Set-level | ## MycoSNP-WDL -WDL wrappers of [CDCGov/mycosnp-nf](https://github.com/CDCgov/mycosnp-nf) for *Candiozyma (Candida) auris* variant calling and single nucleotide polymorphism (SNP) phylogenetic tree reconstruction. +WDL wrappers of [CDCGov/mycosnp-nf](https://github.com/CDCgov/mycosnp-nf) for *Candiozyma (Candida) auris* [variant calling](#wf_mycosnp_variants.wdl) and single nucleotide polymorphism (SNP) [phylogenetic tree reconstruction](#wf_mycosnp_treewdl). ### wf_mycosnp_variants.wdl @@ -63,6 +63,8 @@ WDL wrappers of [CDCGov/mycosnp-nf](https://github.com/CDCgov/mycosnp-nf) for *C | myco_bam_bai | File | BAM index file | | full_results | File | Full results file | + + ### wf_mycosnp_tree.wdl #### Inputs From 4b3298a9cae091d06c3337f60a3ce12ba31a681c Mon Sep 17 00:00:00 2001 From: xonq Date: Fri, 31 Jan 2025 17:47:47 +0000 Subject: [PATCH 09/27] include blurbs about workflows --- README.md | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 307d998..fb0b87d 100644 --- a/README.md +++ b/README.md @@ -6,10 +6,15 @@ |---|---|---|---|---| | MycoSNP-WDL | Fungi | mycosnp-wdl v1.5 | Yes | Sample-level, Set-level | + ## MycoSNP-WDL -WDL wrappers of [CDCGov/mycosnp-nf](https://github.com/CDCgov/mycosnp-nf) for *Candiozyma (Candida) auris* [variant calling](#wf_mycosnp_variants.wdl) and single nucleotide polymorphism (SNP) [phylogenetic tree reconstruction](#wf_mycosnp_treewdl). +WDL wrappers of and Terra.bio support for [CDCGov/mycosnp-nf](https://github.com/CDCgov/mycosnp-nf). These workflows conduct *Candiozyma (Candida) auris* [variant calling](#wf_mycosnp_variants.wdl) and single nucleotide polymorphism (SNP) [phylogenetic tree reconstruction](#wf_mycosnp_treewdl). + +
### wf_mycosnp_variants.wdl +This workflow calls variants for an inputted `.tar`/`.fasta` referencing the *C. auris* B11205 assembly accession [GCA_016772135](https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_016772135/). + #### Inputs @@ -63,9 +68,10 @@ WDL wrappers of [CDCGov/mycosnp-nf](https://github.com/CDCgov/mycosnp-nf) for *C | myco_bam_bai | File | BAM index file | | full_results | File | Full results file | - +
### wf_mycosnp_tree.wdl +This workflow reconstructs an IQ-TREE SNP phylogenetic tree that incorporates representative genomes of Clade1-Clade5 *C. auris*. #### Inputs From 16ac56096b1ec9ee032ffd818bfbb3f4977f7180 Mon Sep 17 00:00:00 2001 From: xonq Date: Fri, 31 Jan 2025 17:49:54 +0000 Subject: [PATCH 10/27] expand inputs and explicitly delineate that variant calling is an initial dependency --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index fb0b87d..1e7e1d0 100644 --- a/README.md +++ b/README.md @@ -8,7 +8,7 @@ ## MycoSNP-WDL -WDL wrappers of and Terra.bio support for [CDCGov/mycosnp-nf](https://github.com/CDCgov/mycosnp-nf). These workflows conduct *Candiozyma (Candida) auris* [variant calling](#wf_mycosnp_variants.wdl) and single nucleotide polymorphism (SNP) [phylogenetic tree reconstruction](#wf_mycosnp_treewdl). +WDL wrappers of and Terra.bio support for [CDCGov/mycosnp-nf](https://github.com/CDCgov/mycosnp-nf). These workflows conduct *Candiozyma (Candida) auris* [variant calling](#wf_mycosnp_variants.wdl) and subsequent single nucleotide polymorphism (SNP) [phylogenetic tree reconstruction](#wf_mycosnp_treewdl).
@@ -71,7 +71,7 @@ This workflow calls variants for an inputted `.tar`/`.fasta` referencing the *C.
### wf_mycosnp_tree.wdl -This workflow reconstructs an IQ-TREE SNP phylogenetic tree that incorporates representative genomes of Clade1-Clade5 *C. auris*. +This workflow reconstructs an IQ-TREE SNP phylogenetic tree that incorporates representative genomes of Clade1-Clade5 *C. auris*. VCF data generated from [wf_mycosnp_variants.wdl](#wf_mycosnp_variantswdl) are used as inputs. #### Inputs From 0f146f928a6a0f64811b2860ac7aa1b952022c52 Mon Sep 17 00:00:00 2001 From: xonq Date: Fri, 31 Jan 2025 20:49:01 +0000 Subject: [PATCH 11/27] include reference clades --- README.md | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 1e7e1d0..c87347f 100644 --- a/README.md +++ b/README.md @@ -13,7 +13,31 @@ WDL wrappers of and Terra.bio support for [CDCGov/mycosnp-nf](https://github.com
### wf_mycosnp_variants.wdl -This workflow calls variants for an inputted `.tar`/`.fasta` referencing the *C. auris* B11205 assembly accession [GCA_016772135](https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_016772135/). +This workflow calls variants for inputted reads referencing the *C. auris* B11204 assembly accession [GCA_016772135](https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_016772135/) by default. Users can optionally reference a separate *C. auris* clade depicted in the [reference data directory](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference), supply a reference `.fasta` that will be indexed in BWA and generate all necessary files for running the workflow, or provide a `.tar.gz` with the same directory structure as the provided reference clades: + +``` +data/reference +├── B11221 +├── Clade1 +│ ├── bwa/bwa +| | ├── reference.amb +| | ├── reference.ann +| | ├── reference.bwt +| | ├── reference.pac +| | └── reference.sa +│ ├── dict +| | └── reference.dict +│ ├── fai +| | └── reference.fa.fai +│ ├── masked +| | └── reference.fa +│ └── Clade1.fasta +├── Clade2 +├── Clade3 +├── Clade4 +├── Clade5 +└── GCA_016772135 +``` #### Inputs From 2f073a39c546344ea6d7f3bbaa423b620a3dcbc7 Mon Sep 17 00:00:00 2001 From: xonq Date: Fri, 31 Jan 2025 20:53:07 +0000 Subject: [PATCH 12/27] delineate directory structure appropriately --- README.md | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index c87347f..ee06fe5 100644 --- a/README.md +++ b/README.md @@ -13,18 +13,19 @@ WDL wrappers of and Terra.bio support for [CDCGov/mycosnp-nf](https://github.com
### wf_mycosnp_variants.wdl -This workflow calls variants for inputted reads referencing the *C. auris* B11204 assembly accession [GCA_016772135](https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_016772135/) by default. Users can optionally reference a separate *C. auris* clade depicted in the [reference data directory](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference), supply a reference `.fasta` that will be indexed in BWA and generate all necessary files for running the workflow, or provide a `.tar.gz` with the same directory structure as the provided reference clades: +This workflow calls variants for inputted reads referencing the *C. auris* B11204 assembly accession [GCA_016772135](https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_016772135/) by default. Users can optionally reference a separate *C. auris* clade depicted in the [reference data directory](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference), supply a reference FastA (must use suffix `.fa`) that will be indexed via BWA, or provide a gzipped tarchive (`.tar.gz`) with the same directory structure as the provided reference clades: ``` data/reference ├── B11221 ├── Clade1 -│ ├── bwa/bwa -| | ├── reference.amb -| | ├── reference.ann -| | ├── reference.bwt -| | ├── reference.pac -| | └── reference.sa +│ ├── bwa +| | ├── bwa +| | | ├── reference.am +| | | ├── reference.ann +| | | ├── reference.bwt +| | | ├── reference.pac +| | | └── reference.sa │ ├── dict | | └── reference.dict │ ├── fai From 827bc809a4561051826d0a81c85c24332cf9d75f Mon Sep 17 00:00:00 2001 From: xonq Date: Fri, 31 Jan 2025 20:57:27 +0000 Subject: [PATCH 13/27] add back the searchable table --- README.md | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index ee06fe5..79b6ac3 100644 --- a/README.md +++ b/README.md @@ -13,7 +13,7 @@ WDL wrappers of and Terra.bio support for [CDCGov/mycosnp-nf](https://github.com
### wf_mycosnp_variants.wdl -This workflow calls variants for inputted reads referencing the *C. auris* B11204 assembly accession [GCA_016772135](https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_016772135/) by default. Users can optionally reference a separate *C. auris* clade depicted in the [reference data directory](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference), supply a reference FastA (must use suffix `.fa`) that will be indexed via BWA, or provide a gzipped tarchive (`.tar.gz`) with the same directory structure as the provided reference clades: +This is a sample-level workflow that calls variants for inputted reads referencing the *C. auris* B11204 assembly accession [GCA_016772135](https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_016772135/) by default. Users can optionally reference a separate *C. auris* clade as labeled in the [reference data directory](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference), supply a reference FASTA (must use suffix `.fa`) that will be indexed via BWA, or provide a gzipped tarchive (`.tar.gz`) with the same directory structure as the provided reference clades: ``` data/reference @@ -43,6 +43,8 @@ data/reference #### Inputs +
+ | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** | |---|---|---|---|---|---| | wf_mycosnp_variants | **read1** | File | Illumina forward read file in FASTQ format (compression optional) | | Required | @@ -51,8 +53,12 @@ data/reference | wf_mycosnp_variants | **ref_tar** | File | Reference tar file | | Optional | | wf_mycosnp_variants | **fasta** | File | Reference FASTA file | | Optional | +
+ #### Outputs +
+ | **Variable** | **Type** | **Description** | |---|---|---| | mycosnp_variants_vcf | File | VCF file with called variants | @@ -93,6 +99,8 @@ data/reference | myco_bam_bai | File | BAM index file | | full_results | File | Full results file | +
+
### wf_mycosnp_tree.wdl @@ -100,14 +108,20 @@ This workflow reconstructs an IQ-TREE SNP phylogenetic tree that incorporates re #### Inputs +
+ | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** | |---|---|---|---|---|---| | wf_mycosnp_tree | **vcf** | Array[File] | VCF files for analysis | | Required | | wf_mycosnp_tree | **vcf_index** | Array[File] | Index files for the VCF files | | Required | | wf_mycosnp_tree | **fasta** | File | Reference FASTA input | | Optional | +
+ #### Outputs +
+ | **Variable** | **Type** | **Description** | |---|---|---| | mycosnp_tree_version | String | Version of the MycoSNP-WDL workflow | @@ -122,4 +136,6 @@ This workflow reconstructs an IQ-TREE SNP phylogenetic tree that incorporates re | mycosnp_alignment | File | Alignment file | | mycosnptree_snpdists | File | SNP distances file | | mycosnp_tree_full_results | File | Full results file | -| mycosnp_tree_vcf_csv | File | VCF to CSV file | \ No newline at end of file +| mycosnp_tree_vcf_csv | File | VCF to CSV file | + +
\ No newline at end of file From c2f2a4b1b6427d127afb8742b24f08a33e3e8b11 Mon Sep 17 00:00:00 2001 From: xonq Date: Fri, 31 Jan 2025 21:32:17 +0000 Subject: [PATCH 14/27] update mycosnp_tree tables to correspond with terra --- README.md | 33 ++++++++++++++++++++------------- 1 file changed, 20 insertions(+), 13 deletions(-) diff --git a/README.md b/README.md index 79b6ac3..e11edc9 100644 --- a/README.md +++ b/README.md @@ -13,7 +13,7 @@ WDL wrappers of and Terra.bio support for [CDCGov/mycosnp-nf](https://github.com
### wf_mycosnp_variants.wdl -This is a sample-level workflow that calls variants for inputted reads referencing the *C. auris* B11204 assembly accession [GCA_016772135](https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_016772135/) by default. Users can optionally reference a separate *C. auris* clade as labeled in the [reference data directory](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference), supply a reference FASTA (must use suffix `.fa`) that will be indexed via BWA, or provide a gzipped tarchive (`.tar.gz`) with the same directory structure as the provided reference clades: +`mycosnp_variants` is a Terra sample-level workflow that calls variants for inputted reads referencing the *C. auris* B11204 assembly accession [GCA_016772135](https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_016772135/) by default. Users can optionally reference a separate *C. auris* clade as labeled in the [reference data directory](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference), supply a reference FASTA (must use suffix `.fa`) that will be indexed via BWA, or provide a gzipped tarchive (`.tar.gz`) with the same directory structure as the provided reference clades: ``` data/reference @@ -104,7 +104,7 @@ data/reference
### wf_mycosnp_tree.wdl -This workflow reconstructs an IQ-TREE SNP phylogenetic tree that incorporates representative genomes of Clade1-Clade5 *C. auris*. VCF data generated from [wf_mycosnp_variants.wdl](#wf_mycosnp_variantswdl) are used as inputs. +`mycosnp_tree` is a Terra set-level workflow reconstructs an IQ-TREE SNP phylogenetic tree that incorporates representative genomes of Clade1-Clade5 *C. auris*. VCF data generated from [wf_mycosnp_variants.wdl](#wf_mycosnp_variantswdl) are used as inputs. #### Inputs @@ -112,9 +112,16 @@ This workflow reconstructs an IQ-TREE SNP phylogenetic tree that incorporates re | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** | |---|---|---|---|---|---| -| wf_mycosnp_tree | **vcf** | Array[File] | VCF files for analysis | | Required | -| wf_mycosnp_tree | **vcf_index** | Array[File] | Index files for the VCF files | | Required | -| wf_mycosnp_tree | **fasta** | File | Reference FASTA input | | Optional | +| mycosnp_tree | **vcf** | Array[File] | VCF files for analysis | | Required | +| mycosnp_tree | **vcf_index** | Array[File] | Index files for the VCF files | | Required | +| mycosnp_tree | **ref_fasta** | File | Reference FASTA input | | Optional | +| mycosnptree | **cpu** | Int | CPU cores | 4 | Optional | +| mycosnptree | **disk_size** | Int | Disk size (GB) | 50 | Optional | +| mycosnptree | **docker** | String | "us-docker.pkg.dev/general-theiagen/theiagen/mycosnp:1.5" | | Optional | +| mycosnptree | **memory** | Int | RAM (GB) | 32 | Optional | +| mycosnptree | **reference** | String | Preexisting [reference directory](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference) | "GCA_016772135" | Optional | +| mycosnptree | **strain** | String | mycosnp-nf reference strain name | "B11205" | Optional | +| version_capture | **docker** | String | "us-docker.pkg.dev/general-theiagen/theiagen/mycosnp:1.5" | | Optional | @@ -124,18 +131,18 @@ This workflow reconstructs an IQ-TREE SNP phylogenetic tree that incorporates re | **Variable** | **Type** | **Description** | |---|---|---| -| mycosnp_tree_version | String | Version of the MycoSNP-WDL workflow | -| mycosnp_tree_analysis_date | String | Date of the analysis | -| mycosnp_version | String | Version of [MycoSNP-nf](https://github.com/CDCgov/mycosnp-nf/tree/master) used | +| mycosnp_alignment | File | Alignment file | | mycosnp_docker | String | Docker image used for MycoSNP | -| reference_strain | String | Reference strain used | -| reference_name | String | Accession number of the reference strain | -| mycosnp_rapidnj_tree | File | RapidNJ tree file | | mycosnp_fastree_tree | File | FastTree tree file | | mycosnp_iqtree_tree | File | IQ-TREE tree file | -| mycosnp_alignment | File | Alignment file | -| mycosnptree_snpdists | File | SNP distances file | +| mycosnp_rapidnj_tree | File | RapidNJ tree file | +| mycosnp_tree_analysis_date | String | Date of the analysis | | mycosnp_tree_full_results | File | Full results file | | mycosnp_tree_vcf_csv | File | VCF to CSV file | +| mycosnp_tree_version | String | Version of the MycoSNP-WDL workflow | +| mycosnp_version | String | Version of MycoSNP | +| mycosnptree_snpdists | File | SNP distances file | +| reference_name | String | Name of the reference | +| reference_strain | String | Reference strain used | \ No newline at end of file From 63f88f0516177a95cc49af93ed54ab51a4e7aef0 Mon Sep 17 00:00:00 2001 From: xonq Date: Fri, 31 Jan 2025 21:43:55 +0000 Subject: [PATCH 15/27] update mycosnp_variants tables to correspond to Terra i/o --- README.md | 78 +++++++++++++++++++++++++++++++------------------------ 1 file changed, 44 insertions(+), 34 deletions(-) diff --git a/README.md b/README.md index e11edc9..87a4904 100644 --- a/README.md +++ b/README.md @@ -47,11 +47,22 @@ data/reference | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** | |---|---|---|---|---|---| -| wf_mycosnp_variants | **read1** | File | Illumina forward read file in FASTQ format (compression optional) | | Required | -| wf_mycosnp_variants | **read2** | File | Illumina reverse read file in FASTQ format (compression optional) | | Required | -| wf_mycosnp_variants | **samplename** | String | Name of sample to be analyzed | | Required | -| wf_mycosnp_variants | **ref_tar** | File | Reference tar file | | Optional | -| wf_mycosnp_variants | **fasta** | File | Reference FASTA file | | Optional | +| mycosnp_variants | **read1** | File | Illumina forward read file in FASTQ format (compression optional) | | Required | +| mycosnp_variants | **read2** | File | Illumina reverse read file in FASTQ format (compression optional) | | Required | +| mycosnp_variants | **samplename** | String | Name of sample to be analyzed | | Required | +| mycosnp_variants | **ref_tar** | File | Reference tar file | | Optional | +| mycosnp_variants | **fasta** | File | Reference FASTA file | | Optional | +| mycosnp | **coverage** | Int | {…} | | Optional | +| mycosnp | **cpu** | Int | {…} | | Optional | +| mycosnp | **debug** | Boolean | {…} | | Optional | +| mycosnp | **disk_size** | Int | {…} | | Optional | +| mycosnp | **docker** | String | {…} | | Optional | +| mycosnp | **memory** | Int | {…} | | Optional | +| mycosnp | **min_depth** | Int | {…} | | Optional | +| mycosnp | **reference** | String | {…} | | Optional | +| mycosnp | **sample_ploidy** | Int | {…} | | Optional | +| mycosnp | **strain** | String | {…} | | Optional | +| version_capture | **timezone** | String | {…} | | Optional | @@ -61,43 +72,38 @@ data/reference | **Variable** | **Type** | **Description** | |---|---|---| -| mycosnp_variants_vcf | File | VCF file with called variants | -| mycosnp_variants_vcf_index | File | Index file for the VCF | -| mycosnp_variants_bam | File | BAM file with aligned reads | -| mycosnp_variants_bam_index | File | Index file for the BAM | -| mycosnp_variants_stats | File | Statistics file for the variant calling | -| mycosnp_variants_version | String | Version of the MycoSNP variants | -| mycosnp_variants_analysis_date | String | Date of the MycoSNP variants analysis | -| mycosnp_version | String | Version of MycoSNP | -| mycosnp_docker | String | Docker image used for MycoSNP | | analysis_date | String | Date of the analysis | -| reference_strain | String | Reference strain used | -| reference_name | String | Name of the reference | -| reads_before_trimming | Int | Number of reads before trimming | -| gc_before_trimming | Float | GC content before trimming | +| assembly_size | Int | Size of the assembly | +| average_q_score_after_trimming | Float | Average quality score after trimming | | average_q_score_before_trimming | Float | Average quality score before trimming | -| reference_length_coverage_before_trimming | Float | Reference length coverage before trimming | -| reads_after_trimming | Int | Number of reads after trimming | -| reads_after_trimming_percent | String | Percentage of reads after trimming | -| paired_reads_after_trimming | Int | Number of paired reads after trimming | -| paired_reads_after_trimming_percent | String | Percentage of paired reads after trimming | -| unpaired_reads_after_trimming | Int | Number of unpaired reads after trimming | -| unpaired_reads_after_trimming_percent | String | Percentage of unpaired reads after trimming | +| consensus_n_variant_min_depth | Int | Minimum depth for consensus N variant | +| full_results | File | Full results file | | gc_after_trimming | Float | GC content after trimming | -| average_q_score_after_trimming | Float | Average quality score after trimming | -| reference_length_coverage_after_trimming | Float | Reference length coverage after trimming | +| gc_before_trimming | Float | GC content before trimming | | mean_coverage_depth | Float | Mean coverage depth | -| reads_mapped | Int | Number of reads mapped | +| multiqc | File | MultiQC report | +| myco_bam | File | BAM file | +| myco_bam_bai | File | BAM index file | +| mycosnp_docker | String | Docker image used for MycoSNP | +| mycosnp_variants_analysis_date | String | Date of the MycoSNP variants analysis | +| mycosnp_variants_version | String | Version of the MycoSNP variants | +| mycosnp_version | String | Version of MycoSNP | | number_n | Int | Number of N bases | +| paired_reads_after_trimming | Int | Number of paired reads after trimming | +| paired_reads_after_trimming_percent | String | Percentage of paired reads after trimming | | percent_reference_coverage | Float | Percentage of reference coverage | -| assembly_size | Int | Size of the assembly | -| consensus_n_variant_min_depth | Int | Minimum depth for consensus N variant | +| reads_after_trimming | Int | Number of reads after trimming | +| reads_after_trimming_percent | String | Percentage of reads after trimming | +| reads_before_trimming | Int | Number of reads before trimming | +| reads_mapped | Int | Number of reads mapped | +| reference_length_coverage_after_trimming | Float | Reference length coverage after trimming | +| reference_length_coverage_before_trimming | Float | Reference length coverage before trimming | +| reference_name | String | Name of the reference | +| reference_strain | String | Reference strain used | +| unpaired_reads_after_trimming | Int | Number of unpaired reads after trimming | +| unpaired_reads_after_trimming_percent | String | Percentage of unpaired reads after trimming | | vcf | File | VCF file | | vcf_index | File | Index file for the VCF | -| multiqc | File | MultiQC report | -| myco_bam | File | BAM file | -| myco_bam_bai | File | BAM index file | -| full_results | File | Full results file | @@ -108,6 +114,10 @@ data/reference #### Inputs +- **ref_fasta** will generate a new reference directory + +- **strain** is passed to output but does not change workflow function +
| **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** | From 3ea87902fe27a686ac949e4793fca49354625dcb Mon Sep 17 00:00:00 2001 From: xonq Date: Fri, 31 Jan 2025 21:45:08 +0000 Subject: [PATCH 16/27] change release to v1.5 --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 87a4904..e057849 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@ | **Workflow Type** | **Applicable Kingdom** | **Last Known Changes** | **Command-line Compatibility** | **Workflow Level** | |---|---|---|---|---| -| MycoSNP-WDL | Fungi | mycosnp-wdl v1.5 | Yes | Sample-level, Set-level | +| MycoSNP-WDL | Fungi | v1.5 | Yes | Sample-level, Set-level | ## MycoSNP-WDL From 9daffec383e9c93c51cecdf3a2650a1e1c31b853 Mon Sep 17 00:00:00 2001 From: xonq Date: Fri, 31 Jan 2025 21:46:10 +0000 Subject: [PATCH 17/27] update function --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index e057849..7d6a1bd 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,8 @@ | **Workflow Type** | **Applicable Kingdom** | **Last Known Changes** | **Command-line Compatibility** | **Workflow Level** | |---|---|---|---|---| -| MycoSNP-WDL | Fungi | v1.5 | Yes | Sample-level, Set-level | +| mycosnp_variants | Fungi | v1.5 | Yes | Sample-level | +| mycosnp_tree | Fungi | v1.5 | Yes | Set-level | ## MycoSNP-WDL From 54b17db24f6e0c9fd2a3869cd0da712c912d370d Mon Sep 17 00:00:00 2001 From: xonq Date: Fri, 31 Jan 2025 22:03:34 +0000 Subject: [PATCH 18/27] update input notes --- README.md | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 7d6a1bd..e3ff84c 100644 --- a/README.md +++ b/README.md @@ -9,12 +9,18 @@ ## MycoSNP-WDL -WDL wrappers of and Terra.bio support for [CDCGov/mycosnp-nf](https://github.com/CDCgov/mycosnp-nf). These workflows conduct *Candiozyma (Candida) auris* [variant calling](#wf_mycosnp_variants.wdl) and subsequent single nucleotide polymorphism (SNP) [phylogenetic tree reconstruction](#wf_mycosnp_treewdl). +WDL wrappers of [CDCGov/mycosnp-nf](https://github.com/CDCgov/mycosnp-nf) designed for [Terra.bio](https://terra.bio) integration. These workflows conduct *Candiozyma (Candida) auris* [variant calling](#wf_mycosnp_variants.wdl) and subsequent single nucleotide polymorphism (SNP) [phylogenetic tree reconstruction](#wf_mycosnp_treewdl).
### wf_mycosnp_variants.wdl -`mycosnp_variants` is a Terra sample-level workflow that calls variants for inputted reads referencing the *C. auris* B11204 assembly accession [GCA_016772135](https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_016772135/) by default. Users can optionally reference a separate *C. auris* clade as labeled in the [reference data directory](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference), supply a reference FASTA (must use suffix `.fa`) that will be indexed via BWA, or provide a gzipped tarchive (`.tar.gz`) with the same directory structure as the provided reference clades: +`mycosnp_variants` calls variants for inputted reads referencing the *C. auris* B11204 assembly accession [GCA_016772135](https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_016772135/) by default. Users can optionally reference a separate *C. auris* clade as labeled in the [reference data directory](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference). + +#### Inputs + +- **reference** optionally takes a presupplied reference clade directory delineated [here](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference) +- **ref_fasta** optionally takes a reference FASTA (requires suffix `.fa`) that will be indexed via BWA and generate a reference directory +- **ref_tar** optionally takes a gzipped tarchive (`.tar.gz`) with the same directory structure as the provided reference clades: ``` data/reference @@ -41,8 +47,8 @@ data/reference └── GCA_016772135 ``` +- **strain** optionally delineates the strain name for VCF gene name annotation. MycoSNP currently only annotates with respect to the default strain, "B11205", so changing this option will simply bypass VCF annotation. -#### Inputs
@@ -51,8 +57,6 @@ data/reference | mycosnp_variants | **read1** | File | Illumina forward read file in FASTQ format (compression optional) | | Required | | mycosnp_variants | **read2** | File | Illumina reverse read file in FASTQ format (compression optional) | | Required | | mycosnp_variants | **samplename** | String | Name of sample to be analyzed | | Required | -| mycosnp_variants | **ref_tar** | File | Reference tar file | | Optional | -| mycosnp_variants | **fasta** | File | Reference FASTA file | | Optional | | mycosnp | **coverage** | Int | {…} | | Optional | | mycosnp | **cpu** | Int | {…} | | Optional | | mycosnp | **debug** | Boolean | {…} | | Optional | @@ -63,6 +67,8 @@ data/reference | mycosnp | **reference** | String | {…} | | Optional | | mycosnp | **sample_ploidy** | Int | {…} | | Optional | | mycosnp | **strain** | String | {…} | | Optional | +| mycosnp_variants | **ref_fasta** | File | Reference FASTA file | | Optional | +| mycosnp_variants | **ref_tar** | File | Reference tar file | | Optional | | version_capture | **timezone** | String | {…} | | Optional |
@@ -111,12 +117,12 @@ data/reference
### wf_mycosnp_tree.wdl -`mycosnp_tree` is a Terra set-level workflow reconstructs an IQ-TREE SNP phylogenetic tree that incorporates representative genomes of Clade1-Clade5 *C. auris*. VCF data generated from [wf_mycosnp_variants.wdl](#wf_mycosnp_variantswdl) are used as inputs. +`mycosnp_tree` reconstructs an IQ-TREE SNP phylogenetic tree that incorporates representative genomes of Clade1-Clade5 *C. auris*. VCF data generated from [wf_mycosnp_variants.wdl](#wf_mycosnp_variantswdl) are used as inputs. #### Inputs -- **ref_fasta** will generate a new reference directory - +- **reference** optionally takes a presupplied reference clade directory delineated [here](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference) +- **ref_fasta** optionally takes a reference FASTA (requires suffix `.fa`) that will be indexed via BWA and generate a reference directory - **strain** is passed to output but does not change workflow function
From 885c534bbbfca1411653c77cc652b004820a926c Mon Sep 17 00:00:00 2001 From: xonq Date: Fri, 31 Jan 2025 22:23:01 +0000 Subject: [PATCH 19/27] test new table inputs --- README.md | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/README.md b/README.md index e3ff84c..8d44686 100644 --- a/README.md +++ b/README.md @@ -18,8 +18,8 @@ WDL wrappers of [CDCGov/mycosnp-nf](https://github.com/CDCgov/mycosnp-nf) design #### Inputs -- **reference** optionally takes a presupplied reference clade directory delineated [here](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference) -- **ref_fasta** optionally takes a reference FASTA (requires suffix `.fa`) that will be indexed via BWA and generate a reference directory +- **reference** optionally takes a presupplied reference clade directory delineated [here](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference). Currently, this option will fail the workflow with "GCA_016772135" set as the reference - use "B11205" instead. +- **ref_fasta** optionally takes a reference FASTA (requires suffix `.fa`) that will be indexed via BWA and generate a reference directory. - **ref_tar** optionally takes a gzipped tarchive (`.tar.gz`) with the same directory structure as the provided reference clades: ``` @@ -57,13 +57,13 @@ data/reference | mycosnp_variants | **read1** | File | Illumina forward read file in FASTQ format (compression optional) | | Required | | mycosnp_variants | **read2** | File | Illumina reverse read file in FASTQ format (compression optional) | | Required | | mycosnp_variants | **samplename** | String | Name of sample to be analyzed | | Required | -| mycosnp | **coverage** | Int | {…} | | Optional | -| mycosnp | **cpu** | Int | {…} | | Optional | -| mycosnp | **debug** | Boolean | {…} | | Optional | -| mycosnp | **disk_size** | Int | {…} | | Optional | -| mycosnp | **docker** | String | {…} | | Optional | -| mycosnp | **memory** | Int | {…} | | Optional | -| mycosnp | **min_depth** | Int | {…} | | Optional | +| mycosnp | **coverage** | Int | Coverage is used to calculate a down-sampling rate that results in the specified coverage. For example, if coverage is 70, then FASTQ files are down-sampled such that, when aligned to the reference, the result is approximately 70x coverage | 0 | Optional | +| mycosnp | **cpu** | Int | CPU cores | 8 | Optional | +| mycosnp | **debug** | Boolean | Keeps `.nextflow/` and `work/` directories | false | Optional | +| mycosnp | **disk_size** | Int | Disk size (GB) | 100 | Optional | +| mycosnp | **docker** | String | Workflow Docker container | "us-docker.pkg.dev/general-theiagen/theiagen/mycosnp:1.5 | Optional | +| mycosnp | **memory** | Int | RAM (GB) | 64 | Optional | +| mycosnp | **min_depth** | Int | Min depth for a base to be called as the consensus sequence, otherwise it will be called as an N; set to 0 to disable | 10 | Optional | | mycosnp | **reference** | String | {…} | | Optional | | mycosnp | **sample_ploidy** | Int | {…} | | Optional | | mycosnp | **strain** | String | {…} | | Optional | @@ -121,9 +121,9 @@ data/reference #### Inputs -- **reference** optionally takes a presupplied reference clade directory delineated [here](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference) -- **ref_fasta** optionally takes a reference FASTA (requires suffix `.fa`) that will be indexed via BWA and generate a reference directory -- **strain** is passed to output but does not change workflow function +- **reference** optionally takes a presupplied reference clade directory delineated [here](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference). +- **ref_fasta** optionally takes a reference FASTA (requires suffix `.fa`) that will be indexed via BWA and generate a reference directory. +- **strain** is passed to output but does not change workflow function.
From 70535d7a4ded54a705d48777c01616a93e9fd7e0 Mon Sep 17 00:00:00 2001 From: xonq Date: Fri, 31 Jan 2025 22:29:44 +0000 Subject: [PATCH 20/27] update input delineation in tables --- README.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 8d44686..b7f7093 100644 --- a/README.md +++ b/README.md @@ -64,12 +64,12 @@ data/reference | mycosnp | **docker** | String | Workflow Docker container | "us-docker.pkg.dev/general-theiagen/theiagen/mycosnp:1.5 | Optional | | mycosnp | **memory** | Int | RAM (GB) | 64 | Optional | | mycosnp | **min_depth** | Int | Min depth for a base to be called as the consensus sequence, otherwise it will be called as an N; set to 0 to disable | 10 | Optional | -| mycosnp | **reference** | String | {…} | | Optional | -| mycosnp | **sample_ploidy** | Int | {…} | | Optional | -| mycosnp | **strain** | String | {…} | | Optional | +| mycosnp | **reference** | String | Reference clade | "GCA_016772135" | Optional | +| mycosnp | **sample_ploidy** | Int | 1 | Ploidy of sample (GATK) | Optional | +| mycosnp | **strain** | String | Reference strain | "B11205" | Optional | | mycosnp_variants | **ref_fasta** | File | Reference FASTA file | | Optional | -| mycosnp_variants | **ref_tar** | File | Reference tar file | | Optional | -| version_capture | **timezone** | String | {…} | | Optional | +| mycosnp_variants | **ref_tar** | File | Reference gzipped compressed tarchive | | Optional | +| version_capture | **timezone** | String | Alternative timezone | | Optional |
@@ -156,7 +156,7 @@ data/reference | mycosnp_tree_analysis_date | String | Date of the analysis | | mycosnp_tree_full_results | File | Full results file | | mycosnp_tree_vcf_csv | File | VCF to CSV file | -| mycosnp_tree_version | String | Version of the MycoSNP-WDL workflow | +| mycosnp_tree_version | String | Version of the `mycosnp_tree` WDL workflow | | mycosnp_version | String | Version of MycoSNP | | mycosnptree_snpdists | File | SNP distances file | | reference_name | String | Name of the reference | From c760aa08f55d2fad3d74ffe045285f96b9eb0b95 Mon Sep 17 00:00:00 2001 From: xonq Date: Fri, 31 Jan 2025 22:34:25 +0000 Subject: [PATCH 21/27] formatting --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index b7f7093..ec47cff 100644 --- a/README.md +++ b/README.md @@ -61,7 +61,7 @@ data/reference | mycosnp | **cpu** | Int | CPU cores | 8 | Optional | | mycosnp | **debug** | Boolean | Keeps `.nextflow/` and `work/` directories | false | Optional | | mycosnp | **disk_size** | Int | Disk size (GB) | 100 | Optional | -| mycosnp | **docker** | String | Workflow Docker container | "us-docker.pkg.dev/general-theiagen/theiagen/mycosnp:1.5 | Optional | +| mycosnp | **docker** | String | Workflow Docker container | "us-docker.pkg.dev/general-theiagen/theiagen/mycosnp:1.5" | Optional | | mycosnp | **memory** | Int | RAM (GB) | 64 | Optional | | mycosnp | **min_depth** | Int | Min depth for a base to be called as the consensus sequence, otherwise it will be called as an N; set to 0 to disable | 10 | Optional | | mycosnp | **reference** | String | Reference clade | "GCA_016772135" | Optional | @@ -134,11 +134,11 @@ data/reference | mycosnp_tree | **ref_fasta** | File | Reference FASTA input | | Optional | | mycosnptree | **cpu** | Int | CPU cores | 4 | Optional | | mycosnptree | **disk_size** | Int | Disk size (GB) | 50 | Optional | -| mycosnptree | **docker** | String | "us-docker.pkg.dev/general-theiagen/theiagen/mycosnp:1.5" | | Optional | +| mycosnptree | **docker** | String | Workflow Docker container | "us-docker.pkg.dev/general-theiagen/theiagen/mycosnp:1.5" | Optional | | mycosnptree | **memory** | Int | RAM (GB) | 32 | Optional | | mycosnptree | **reference** | String | Preexisting [reference directory](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference) | "GCA_016772135" | Optional | | mycosnptree | **strain** | String | mycosnp-nf reference strain name | "B11205" | Optional | -| version_capture | **docker** | String | "us-docker.pkg.dev/general-theiagen/theiagen/mycosnp:1.5" | | Optional | +| version_capture | **timezone** | String | Alternative timezone | | Optional |
From c06947038d98442d74c12c19762a50448d9d70f0 Mon Sep 17 00:00:00 2001 From: xonq Date: Fri, 31 Jan 2025 22:36:12 +0000 Subject: [PATCH 22/27] expand on reference info --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index ec47cff..c6c8488 100644 --- a/README.md +++ b/README.md @@ -14,7 +14,7 @@ WDL wrappers of [CDCGov/mycosnp-nf](https://github.com/CDCgov/mycosnp-nf) design
### wf_mycosnp_variants.wdl -`mycosnp_variants` calls variants for inputted reads referencing the *C. auris* B11204 assembly accession [GCA_016772135](https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_016772135/) by default. Users can optionally reference a separate *C. auris* clade as labeled in the [reference data directory](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference). +`mycosnp_variants` calls variants for inputted reads referencing the *C. auris* B11204 assembly accession [GCA_016772135](https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_016772135/) by default. Users can optionally reference a separate *C. auris* clade [data directory](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference), fasta, or directory as described below. #### Inputs From 67ad89103f14458adc4274bbeec6770b630987ac Mon Sep 17 00:00:00 2001 From: xonq Date: Fri, 31 Jan 2025 22:36:33 +0000 Subject: [PATCH 23/27] capitalize fasta --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index c6c8488..d4e2dd2 100644 --- a/README.md +++ b/README.md @@ -14,7 +14,7 @@ WDL wrappers of [CDCGov/mycosnp-nf](https://github.com/CDCgov/mycosnp-nf) design
### wf_mycosnp_variants.wdl -`mycosnp_variants` calls variants for inputted reads referencing the *C. auris* B11204 assembly accession [GCA_016772135](https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_016772135/) by default. Users can optionally reference a separate *C. auris* clade [data directory](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference), fasta, or directory as described below. +`mycosnp_variants` calls variants for inputted reads referencing the *C. auris* B11204 assembly accession [GCA_016772135](https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_016772135/) by default. Users can optionally reference a separate *C. auris* clade [data directory](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference), FASTA, or directory as described below. #### Inputs From b3f51282e5a7eaf98158df2c292c98963893d314 Mon Sep 17 00:00:00 2001 From: xonq Date: Fri, 31 Jan 2025 22:40:45 +0000 Subject: [PATCH 24/27] conform to PHB formatting --- README.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index d4e2dd2..2d31aa5 100644 --- a/README.md +++ b/README.md @@ -58,11 +58,11 @@ data/reference | mycosnp_variants | **read2** | File | Illumina reverse read file in FASTQ format (compression optional) | | Required | | mycosnp_variants | **samplename** | String | Name of sample to be analyzed | | Required | | mycosnp | **coverage** | Int | Coverage is used to calculate a down-sampling rate that results in the specified coverage. For example, if coverage is 70, then FASTQ files are down-sampled such that, when aligned to the reference, the result is approximately 70x coverage | 0 | Optional | -| mycosnp | **cpu** | Int | CPU cores | 8 | Optional | +| mycosnp | **cpu** | Int | Number of CPUs to allocate to the task | 8 | Optional | | mycosnp | **debug** | Boolean | Keeps `.nextflow/` and `work/` directories | false | Optional | -| mycosnp | **disk_size** | Int | Disk size (GB) | 100 | Optional | -| mycosnp | **docker** | String | Workflow Docker container | "us-docker.pkg.dev/general-theiagen/theiagen/mycosnp:1.5" | Optional | -| mycosnp | **memory** | Int | RAM (GB) | 64 | Optional | +| mycosnp | **disk_size** | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | +| mycosnp | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/mycosnp:1.5" | Optional | +| mycosnp | **memory** | Int | Amount of memory/RAM (in GB) to allocate to the task | 64 | Optional | | mycosnp | **min_depth** | Int | Min depth for a base to be called as the consensus sequence, otherwise it will be called as an N; set to 0 to disable | 10 | Optional | | mycosnp | **reference** | String | Reference clade | "GCA_016772135" | Optional | | mycosnp | **sample_ploidy** | Int | 1 | Ploidy of sample (GATK) | Optional | @@ -132,10 +132,10 @@ data/reference | mycosnp_tree | **vcf** | Array[File] | VCF files for analysis | | Required | | mycosnp_tree | **vcf_index** | Array[File] | Index files for the VCF files | | Required | | mycosnp_tree | **ref_fasta** | File | Reference FASTA input | | Optional | -| mycosnptree | **cpu** | Int | CPU cores | 4 | Optional | -| mycosnptree | **disk_size** | Int | Disk size (GB) | 50 | Optional | -| mycosnptree | **docker** | String | Workflow Docker container | "us-docker.pkg.dev/general-theiagen/theiagen/mycosnp:1.5" | Optional | -| mycosnptree | **memory** | Int | RAM (GB) | 32 | Optional | +| mycosnptree | **cpu** | Int | Number of CPUs to allocate to the task | 8 | Optional | +| mycosnptree | **disk_size** | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | +| mycosnptree | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/mycosnp:1.5" | Optional | +| mycosnptree | **memory** | Int | Amount of memory/RAM (in GB) to allocate to the task | 64 | Optional | | mycosnptree | **reference** | String | Preexisting [reference directory](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference) | "GCA_016772135" | Optional | | mycosnptree | **strain** | String | mycosnp-nf reference strain name | "B11205" | Optional | | version_capture | **timezone** | String | Alternative timezone | | Optional | From 80b4b12f10130ea45794bcc792e63fd8d69ec52f Mon Sep 17 00:00:00 2001 From: xonq Date: Tue, 4 Feb 2025 15:59:01 +0000 Subject: [PATCH 25/27] add note on genome requirements for mycosnp_tree in README --- README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/README.md b/README.md index 2d31aa5..0cbcc40 100644 --- a/README.md +++ b/README.md @@ -16,6 +16,8 @@ WDL wrappers of [CDCGov/mycosnp-nf](https://github.com/CDCgov/mycosnp-nf) design ### wf_mycosnp_variants.wdl `mycosnp_variants` calls variants for inputted reads referencing the *C. auris* B11204 assembly accession [GCA_016772135](https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_016772135/) by default. Users can optionally reference a separate *C. auris* clade [data directory](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference), FASTA, or directory as described below. +Note that `mycosnp_tree` requires at least 4 genomes that reference the same reference in `mycosnp_variants`. + #### Inputs - **reference** optionally takes a presupplied reference clade directory delineated [here](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference). Currently, this option will fail the workflow with "GCA_016772135" set as the reference - use "B11205" instead. From ec553ca2edf1e254fcf222b04d068331756dbf78 Mon Sep 17 00:00:00 2001 From: xonq Date: Thu, 6 Feb 2025 01:12:59 +0000 Subject: [PATCH 26/27] incorporate Fraser's proposed changes for higher quality I/O delineation --- README.md | 40 +++++++++++++++++++++------------------- 1 file changed, 21 insertions(+), 19 deletions(-) diff --git a/README.md b/README.md index 0cbcc40..4bca6bb 100644 --- a/README.md +++ b/README.md @@ -26,27 +26,27 @@ Note that `mycosnp_tree` requires at least 4 genomes that reference the same ref ``` data/reference -├── B11221 +├── B11221 # Prebuilt clade directory ├── Clade1 │ ├── bwa -| | ├── bwa +| | ├── bwa # BWA index for alignment | | | ├── reference.am | | | ├── reference.ann | | | ├── reference.bwt | | | ├── reference.pac | | | └── reference.sa -│ ├── dict -| | └── reference.dict -│ ├── fai -| | └── reference.fa.fai -│ ├── masked -| | └── reference.fa +│ ├── dict +| | └── reference.dict # Picard dictionary +│ ├── fai +| | └── reference.fa.fai # FASTA index file +│ ├── masked +| | └── reference.fa # Masked reference sequence │ └── Clade1.fasta ├── Clade2 ├── Clade3 ├── Clade4 ├── Clade5 -└── GCA_016772135 +└── GCA_016772135 # Default reference ``` - **strain** optionally delineates the strain name for VCF gene name annotation. MycoSNP currently only annotates with respect to the default strain, "B11205", so changing this option will simply bypass VCF annotation. @@ -61,7 +61,7 @@ data/reference | mycosnp_variants | **samplename** | String | Name of sample to be analyzed | | Required | | mycosnp | **coverage** | Int | Coverage is used to calculate a down-sampling rate that results in the specified coverage. For example, if coverage is 70, then FASTQ files are down-sampled such that, when aligned to the reference, the result is approximately 70x coverage | 0 | Optional | | mycosnp | **cpu** | Int | Number of CPUs to allocate to the task | 8 | Optional | -| mycosnp | **debug** | Boolean | Keeps `.nextflow/` and `work/` directories | false | Optional | +| mycosnp | **debug** | Boolean | If true, keeps `.nextflow/` and `work/` directories | false | Optional | | mycosnp | **disk_size** | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | | mycosnp | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/mycosnp:1.5" | Optional | | mycosnp | **memory** | Int | Amount of memory/RAM (in GB) to allocate to the task | 64 | Optional | @@ -107,12 +107,12 @@ data/reference | reads_mapped | Int | Number of reads mapped | | reference_length_coverage_after_trimming | Float | Reference length coverage after trimming | | reference_length_coverage_before_trimming | Float | Reference length coverage before trimming | -| reference_name | String | Name of the reference | +| reference_name | String | Name of the reference genome used | | reference_strain | String | Reference strain used | | unpaired_reads_after_trimming | Int | Number of unpaired reads after trimming | | unpaired_reads_after_trimming_percent | String | Percentage of unpaired reads after trimming | -| vcf | File | VCF file | -| vcf_index | File | Index file for the VCF | +| vcf | File | Compressed variant call format (VCF) file depicting SNPs | +| vcf_index | File | Compressed index file for the VCF |
@@ -121,6 +121,8 @@ data/reference ### wf_mycosnp_tree.wdl `mycosnp_tree` reconstructs an IQ-TREE SNP phylogenetic tree that incorporates representative genomes of Clade1-Clade5 *C. auris*. VCF data generated from [wf_mycosnp_variants.wdl](#wf_mycosnp_variantswdl) are used as inputs. +NOTE: At least four samples, including reference, are required + #### Inputs - **reference** optionally takes a presupplied reference clade directory delineated [here](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference). @@ -131,7 +133,7 @@ data/reference | **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** | |---|---|---|---|---|---| -| mycosnp_tree | **vcf** | Array[File] | VCF files for analysis | | Required | +| mycosnp_tree | **vcf** | Array[File] | VCF files (.vcf.gz) containing SNP data for phylogenetic analysis. These files can be generated from `wf_mycosnp_variants.wdl` | | Required | | mycosnp_tree | **vcf_index** | Array[File] | Index files for the VCF files | | Required | | mycosnp_tree | **ref_fasta** | File | Reference FASTA input | | Optional | | mycosnptree | **cpu** | Int | Number of CPUs to allocate to the task | 8 | Optional | @@ -150,14 +152,14 @@ data/reference | **Variable** | **Type** | **Description** | |---|---|---| -| mycosnp_alignment | File | Alignment file | +| mycosnp_alignment | File | Concatenated SNP alignment file | | mycosnp_docker | String | Docker image used for MycoSNP | -| mycosnp_fastree_tree | File | FastTree tree file | -| mycosnp_iqtree_tree | File | IQ-TREE tree file | -| mycosnp_rapidnj_tree | File | RapidNJ tree file | +| mycosnp_fastree_tree | File | Phylogenetic tree inferred using FastTree (heuristic maximum likelihood) | +| mycosnp_iqtree_tree | File | Phylogenetic tree inferred using IQ-TREE (high quality maximum likelihood) | +| mycosnp_rapidnj_tree | File | Phylogenetic tree inferred using RapidNJ (neighbor-joining method) | | mycosnp_tree_analysis_date | String | Date of the analysis | | mycosnp_tree_full_results | File | Full results file | -| mycosnp_tree_vcf_csv | File | VCF to CSV file | +| mycosnp_tree_vcf_csv | File | SNP variants formatted as a CSV table | | mycosnp_tree_version | String | Version of the `mycosnp_tree` WDL workflow | | mycosnp_version | String | Version of MycoSNP | | mycosnptree_snpdists | File | SNP distances file | From d5f97765c043b7d24979846ab007121793c99b32 Mon Sep 17 00:00:00 2001 From: xonq Date: Thu, 6 Feb 2025 01:16:47 +0000 Subject: [PATCH 27/27] doesnt fail anymore --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 4bca6bb..a702089 100644 --- a/README.md +++ b/README.md @@ -20,7 +20,7 @@ Note that `mycosnp_tree` requires at least 4 genomes that reference the same ref #### Inputs -- **reference** optionally takes a presupplied reference clade directory delineated [here](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference). Currently, this option will fail the workflow with "GCA_016772135" set as the reference - use "B11205" instead. +- **reference** optionally takes a presupplied reference clade directory depicted [here](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference). The default is `GCA_016772135`. - **ref_fasta** optionally takes a reference FASTA (requires suffix `.fa`) that will be indexed via BWA and generate a reference directory. - **ref_tar** optionally takes a gzipped tarchive (`.tar.gz`) with the same directory structure as the provided reference clades: