-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MycoSNP-WDL] Update README.md to delineate workflow I/O and usage #6
base: main
Are you sure you want to change the base?
Conversation
… for MycoSNP-WDL; Terra task delineation needs to be updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some ideas on wording that may add more context for the reader. Otherwise great documentation!
- **ref_tar** optionally takes a gzipped tarchive (`.tar.gz`) with the same directory structure as the provided reference clades: | ||
|
||
``` | ||
data/reference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recommend,
data/reference
├── B11221 # Prebuilt clade directory
├── Clade1
│ ├── bwa # BWA index for alignment
│ ├── dict # Picard dictionary
│ ├── fai # FASTA index file
│ ├── masked # Masked reference sequence
│ └── Clade1.fasta # Main reference FASTA
├── Clade2
├── Clade3
├── Clade4
├── Clade5
└── GCA_016772135 # Default reference
```
|
||
### wf_mycosnp_tree.wdl | ||
`mycosnp_tree` reconstructs an IQ-TREE SNP phylogenetic tree that incorporates representative genomes of Clade1-Clade5 *C. auris*. VCF data generated from [wf_mycosnp_variants.wdl](#wf_mycosnp_variantswdl) are used as inputs. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tree will fail with less than 4 samples so I think we should add this in. IQ tree wont run if less than 4 samples are in the file I saw in the log output
README.md
Outdated
|
||
#### Inputs | ||
|
||
- **reference** optionally takes a presupplied reference clade directory delineated [here](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference). Currently, this option will fail the workflow with "GCA_016772135" set as the reference - use "B11205" instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can update this. i have successful runs using the default settings for variants and tree. -- reference optionally takes a presupplied reference clade directory delineated here. The default reference GCA_016772135
is fully supported, but users may specify an alternative reference, such as B11205
or other clade specific reference, if desired.
README.md
Outdated
| mycosnp_variants | **samplename** | String | Name of sample to be analyzed | | Required | | ||
| mycosnp | **coverage** | Int | Coverage is used to calculate a down-sampling rate that results in the specified coverage. For example, if coverage is 70, then FASTQ files are down-sampled such that, when aligned to the reference, the result is approximately 70x coverage | 0 | Optional | | ||
| mycosnp | **cpu** | Int | Number of CPUs to allocate to the task | 8 | Optional | | ||
| mycosnp | **debug** | Boolean | Keeps `.nextflow/` and `work/` directories | false | Optional | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If true, keeps .nextflow/ and work/ directories for debugging purposes.
README.md
Outdated
| reference_strain | String | Reference strain used | | ||
| unpaired_reads_after_trimming | Int | Number of unpaired reads after trimming | | ||
| unpaired_reads_after_trimming_percent | String | Percentage of unpaired reads after trimming | | ||
| vcf | File | VCF file | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Final variant call format (VCF) file containing SNPs.
README.md
Outdated
| reads_mapped | Int | Number of reads mapped | | ||
| reference_length_coverage_after_trimming | Float | Reference length coverage after trimming | | ||
| reference_length_coverage_before_trimming | Float | Reference length coverage before trimming | | ||
| reference_name | String | Name of the reference | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Name of the reference genome used.
README.md
Outdated
|
||
| **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** | | ||
|---|---|---|---|---|---| | ||
| mycosnp_tree | **vcf** | Array[File] | VCF files for analysis | | Required | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add above inputs,
Compressed VCF files (.vcf.gz) containing SNP data for phylogenetic analysis. These files should be generated from wf_mycosnp_variants.wdl.
README.md
Outdated
|
||
| **Variable** | **Type** | **Description** | | ||
|---|---|---| | ||
| mycosnp_alignment | File | Alignment file | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concatenated SNP alignment used for tree inference.
README.md
Outdated
| mycosnp_rapidnj_tree | File | RapidNJ tree file | | ||
| mycosnp_tree_analysis_date | String | Date of the analysis | | ||
| mycosnp_tree_full_results | File | Full results file | | ||
| mycosnp_tree_vcf_csv | File | VCF to CSV file | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SNP variants formatted as a CSV table for external analysis
For the tree methods:
mycosnp_fastree_tree | File | Phylogenetic tree inferred using FastTree.
mycosnp_iqtree_tree | File | Phylogenetic tree inferred using IQ-TREE (maximum likelihood method).
mycosnp_rapidnj_tree | File | Phylogenetic tree inferred using RapidNJ (neighbor-joining method).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these changes have been incorporated into the most recent commit (ec553ca) - thank you for the suggestions
This PR updates the main repository README.md to delineate workflow I/O and usage.
🗑️ This dev branch should be deleted after merging to main.
🧠 Summary
Delineate workflow I/O and usage in the README
⚡ Impacted Workflows/Tasks
None
This PR may lead to different results in pre-existing outputs: No
This PR uses an element that could cause duplicate runs to have different results: No
🛠️ Changes
Delineate workflow I/O and usage in README.md
⚙️ Algorithm
n/a
➡️ Inputs
n/a
⬅️ Outputs
n/a
🧪 Testing
n/a
Suggested Scenarios for Reviewer to Test
n/a
🔬 Final Developer Checklist
workflows_overview
tables.🎯 Reviewer Checklist