[MycoSNP-WDL] Update README.md to delineate workflow I/O and usage #6

xonq · 2025-01-17T19:27:31Z

This PR updates the main repository README.md to delineate workflow I/O and usage.

🗑️ This dev branch should be deleted after merging to main.

🧠 Summary

Delineate workflow I/O and usage in the README

⚡ Impacted Workflows/Tasks

None

This PR may lead to different results in pre-existing outputs: No

This PR uses an element that could cause duplicate runs to have different results: No

🛠️ Changes

Delineate workflow I/O and usage in README.md

⚙️ Algorithm

n/a

➡️ Inputs

n/a

⬅️ Outputs

n/a

🧪 Testing

n/a

Suggested Scenarios for Reviewer to Test

n/a

🔬 Final Developer Checklist

The workflow/task has been tested and results, including file contents, are as anticipated
The CI/CD has been adjusted and tests are passing (Theiagen developers)
Code changes follow the style guide
Documentation and/or workflow diagrams have been updated if applicable
- You have updated the latest version for any affected worklows in the respective workflow documentation page and for every entry in the three workflows_overview tables.

🎯 Reviewer Checklist

All changed results have been confirmed
You have tested the PR appropriately (see the testing guide for more information)
All code adheres to the style guide
MD5 sums have been updated
The PR author has addressed all comments
The documentation has been updated

… for MycoSNP-WDL; Terra task delineation needs to be updated

…ycoSNP

…tial dependency

fraser-combe

Some ideas on wording that may add more context for the reader. Otherwise great documentation!

fraser-combe · 2025-02-03T18:40:36Z

README.md

+- **ref_tar** optionally takes a gzipped tarchive (`.tar.gz`) with the same directory structure as the provided reference clades:
+
+```
+data/reference


recommend,

data/reference ├── B11221 # Prebuilt clade directory ├── Clade1 │ ├── bwa # BWA index for alignment │ ├── dict # Picard dictionary │ ├── fai # FASTA index file │ ├── masked # Masked reference sequence │ └── Clade1.fasta # Main reference FASTA ├── Clade2 ├── Clade3 ├── Clade4 ├── Clade5 └── GCA_016772135 # Default reference ```

fraser-combe · 2025-02-03T22:25:58Z

README.md

+
+### wf_mycosnp_tree.wdl
+`mycosnp_tree` reconstructs an IQ-TREE SNP phylogenetic tree that incorporates representative genomes of Clade1-Clade5 *C. auris*. VCF data generated from [wf_mycosnp_variants.wdl](#wf_mycosnp_variantswdl) are used as inputs.
+


Tree will fail with less than 4 samples so I think we should add this in. IQ tree wont run if less than 4 samples are in the file I saw in the log output

fraser-combe · 2025-02-04T02:34:30Z

README.md

+
+#### Inputs 
+
+- **reference** optionally takes a presupplied reference clade directory delineated [here](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference). Currently, this option will fail the workflow with "GCA_016772135" set as the reference - use "B11205" instead.


We can update this. i have successful runs using the default settings for variants and tree. -- reference optionally takes a presupplied reference clade directory delineated here. The default reference GCA_016772135 is fully supported, but users may specify an alternative reference, such as B11205 or other clade specific reference, if desired.

fraser-combe · 2025-02-04T02:37:20Z

README.md

+| mycosnp_variants | **samplename** | String | Name of sample to be analyzed | | Required |
+| mycosnp | **coverage** | Int | Coverage is used to calculate a down-sampling rate that results in the specified coverage. For example, if coverage is 70, then FASTQ files are down-sampled such that, when aligned to the reference, the result is approximately 70x coverage | 0 | Optional |
+| mycosnp | **cpu** | Int | Number of CPUs to allocate to the task | 8 | Optional |
+| mycosnp | **debug** | Boolean | Keeps `.nextflow/` and `work/` directories | false | Optional |


If true, keeps .nextflow/ and work/ directories for debugging purposes.

fraser-combe · 2025-02-04T02:37:44Z

README.md

+| reference_strain | String | Reference strain used |
+| unpaired_reads_after_trimming | Int | Number of unpaired reads after trimming |
+| unpaired_reads_after_trimming_percent | String | Percentage of unpaired reads after trimming |
+| vcf | File | VCF file |


Final variant call format (VCF) file containing SNPs.

fraser-combe · 2025-02-04T02:38:12Z

README.md

+| reads_mapped | Int | Number of reads mapped |
+| reference_length_coverage_after_trimming | Float | Reference length coverage after trimming |
+| reference_length_coverage_before_trimming | Float | Reference length coverage before trimming |
+| reference_name | String | Name of the reference |


Name of the reference genome used.

fraser-combe · 2025-02-04T02:41:59Z

README.md

+
+| **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** |
+|---|---|---|---|---|---|
+| mycosnp_tree | **vcf** | Array[File] | VCF files for analysis |  | Required |


Maybe add above inputs,

Compressed VCF files (.vcf.gz) containing SNP data for phylogenetic analysis. These files should be generated from wf_mycosnp_variants.wdl.

fraser-combe · 2025-02-04T02:42:38Z

README.md

+
+| **Variable** | **Type** | **Description** |
+|---|---|---|
+| mycosnp_alignment | File | Alignment file |


Concatenated SNP alignment used for tree inference.

fraser-combe · 2025-02-04T02:43:22Z

README.md

+| mycosnp_rapidnj_tree | File | RapidNJ tree file |
+| mycosnp_tree_analysis_date | String | Date of the analysis |
+| mycosnp_tree_full_results | File | Full results file |
+| mycosnp_tree_vcf_csv | File | VCF to CSV file |


SNP variants formatted as a CSV table for external analysis

For the tree methods:

mycosnp_fastree_tree | File | Phylogenetic tree inferred using FastTree.
mycosnp_iqtree_tree | File | Phylogenetic tree inferred using IQ-TREE (maximum likelihood method).
mycosnp_rapidnj_tree | File | Phylogenetic tree inferred using RapidNJ (neighbor-joining method).

these changes have been incorporated into the most recent commit (ec553ca) - thank you for the suggestions

xonq added 24 commits January 17, 2025 19:23

initialize README.md update with detailed workflow inputs and outputs…

480b16a

… for MycoSNP-WDL; Terra task delineation needs to be updated

Update README.md to reflect changes in WDL workflows and inputs for M…

6a8809c

…ycoSNP

Update README.md title to MycoSNP-WDL Workflow Series

a29cea5

remove explicit Terra mention

5bc57e9

change out of searchable table

c40b02c

update table I/O to correspond with PR 7

5cf3b46

formatting

035a6fd

add internal links

f43ae3f

include blurbs about workflows

4b3298a

expand inputs and explicitly delineate that variant calling is an ini…

16ac560

…tial dependency

include reference clades

0f146f9

delineate directory structure appropriately

2f073a3

add back the searchable table

827bc80

update mycosnp_tree tables to correspond with terra

c2f2a4b

update mycosnp_variants tables to correspond to Terra i/o

63f88f0

change release to v1.5

3ea8790

update function

9daffec

update input notes

54b17db

test new table inputs

885c534

update input delineation in tables

70535d7

formatting

c760aa0

expand on reference info

c069470

capitalize fasta

67ad891

conform to PHB formatting

b3f5128

xonq marked this pull request as ready for review January 31, 2025 22:41

xonq requested a review from a team as a code owner January 31, 2025 22:41

fraser-combe requested changes Feb 4, 2025

View reviewed changes

xonq added 3 commits February 4, 2025 15:59

add note on genome requirements for mycosnp_tree in README

80b4b12

incorporate Fraser's proposed changes for higher quality I/O delineation

ec553ca

doesnt fail anymore

d5f9776

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MycoSNP-WDL] Update README.md to delineate workflow I/O and usage #6

[MycoSNP-WDL] Update README.md to delineate workflow I/O and usage #6

xonq commented Jan 17, 2025 •

edited

Loading

fraser-combe left a comment

fraser-combe Feb 3, 2025

fraser-combe Feb 3, 2025

fraser-combe Feb 4, 2025

fraser-combe Feb 4, 2025

fraser-combe Feb 4, 2025

fraser-combe Feb 4, 2025

fraser-combe Feb 4, 2025

fraser-combe Feb 4, 2025

fraser-combe Feb 4, 2025

xonq Feb 6, 2025


		### wf_mycosnp_tree.wdl
		`mycosnp_tree` reconstructs an IQ-TREE SNP phylogenetic tree that incorporates representative genomes of Clade1-Clade5 C. auris. VCF data generated from [wf_mycosnp_variants.wdl](#wf_mycosnp_variantswdl) are used as inputs.


		#### Inputs

		- reference optionally takes a presupplied reference clade directory delineated [here](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference). Currently, this option will fail the workflow with "GCA_016772135" set as the reference - use "B11205" instead.

[MycoSNP-WDL] Update README.md to delineate workflow I/O and usage #6

Are you sure you want to change the base?

[MycoSNP-WDL] Update README.md to delineate workflow I/O and usage #6

Conversation

xonq commented Jan 17, 2025 • edited Loading

🧠 Summary

⚡ Impacted Workflows/Tasks

🛠️ Changes

⚙️ Algorithm

➡️ Inputs

⬅️ Outputs

🧪 Testing

Suggested Scenarios for Reviewer to Test

🔬 Final Developer Checklist

🎯 Reviewer Checklist

fraser-combe left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xonq commented Jan 17, 2025 •

edited

Loading