-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MycoSNP-WDL] Update README.md to delineate workflow I/O and usage #6
Open
xonq
wants to merge
27
commits into
main
Choose a base branch
from
zmk-mycosnp-wdl-dev
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
27 commits
Select commit
Hold shift + click to select a range
480b16a
initialize README.md update with detailed workflow inputs and outputs…
6a8809c
Update README.md to reflect changes in WDL workflows and inputs for M…
a29cea5
Update README.md title to MycoSNP-WDL Workflow Series
5bc57e9
remove explicit Terra mention
c40b02c
change out of searchable table
5cf3b46
update table I/O to correspond with PR 7
035a6fd
formatting
f43ae3f
add internal links
4b3298a
include blurbs about workflows
16ac560
expand inputs and explicitly delineate that variant calling is an ini…
0f146f9
include reference clades
2f073a3
delineate directory structure appropriately
827bc80
add back the searchable table
c2f2a4b
update mycosnp_tree tables to correspond with terra
63f88f0
update mycosnp_variants tables to correspond to Terra i/o
3ea8790
change release to v1.5
9daffec
update function
54b17db
update input notes
885c534
test new table inputs
70535d7
update input delineation in tables
c760aa0
formatting
c069470
expand on reference info
67ad891
capitalize fasta
b3f5128
conform to PHB formatting
80b4b12
add note on genome requirements for mycosnp_tree in README
ec553ca
incorporate Fraser's proposed changes for higher quality I/O delineation
d5f9776
doesnt fail anymore
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,169 @@ | ||
# mycosnp-wdl | ||
A WDL wrapper of [CDCGov/mycosnp-nf for](https://github.com/CDCgov/mycosnp-nf) Terra.bio | ||
# MycoSNP-WDL Workflow Series | ||
|
||
## Quick Facts | ||
|
||
| **Workflow Type** | **Applicable Kingdom** | **Last Known Changes** | **Command-line Compatibility** | **Workflow Level** | | ||
|---|---|---|---|---| | ||
| mycosnp_variants | Fungi | v1.5 | Yes | Sample-level | | ||
| mycosnp_tree | Fungi | v1.5 | Yes | Set-level | | ||
|
||
|
||
## MycoSNP-WDL | ||
WDL wrappers of [CDCGov/mycosnp-nf](https://github.com/CDCgov/mycosnp-nf) designed for [Terra.bio](https://terra.bio) integration. These workflows conduct *Candiozyma (Candida) auris* [variant calling](#wf_mycosnp_variants.wdl) and subsequent single nucleotide polymorphism (SNP) [phylogenetic tree reconstruction](#wf_mycosnp_treewdl). | ||
|
||
<br/> | ||
|
||
### wf_mycosnp_variants.wdl | ||
`mycosnp_variants` calls variants for inputted reads referencing the *C. auris* B11204 assembly accession [GCA_016772135](https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_016772135/) by default. Users can optionally reference a separate *C. auris* clade [data directory](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference), FASTA, or directory as described below. | ||
|
||
Note that `mycosnp_tree` requires at least 4 genomes that reference the same reference in `mycosnp_variants`. | ||
|
||
#### Inputs | ||
|
||
- **reference** optionally takes a presupplied reference clade directory depicted [here](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference). The default is `GCA_016772135`. | ||
- **ref_fasta** optionally takes a reference FASTA (requires suffix `.fa`) that will be indexed via BWA and generate a reference directory. | ||
- **ref_tar** optionally takes a gzipped tarchive (`.tar.gz`) with the same directory structure as the provided reference clades: | ||
|
||
``` | ||
data/reference | ||
├── B11221 # Prebuilt clade directory | ||
├── Clade1 | ||
│ ├── bwa | ||
| | ├── bwa # BWA index for alignment | ||
| | | ├── reference.am | ||
| | | ├── reference.ann | ||
| | | ├── reference.bwt | ||
| | | ├── reference.pac | ||
| | | └── reference.sa | ||
│ ├── dict | ||
| | └── reference.dict # Picard dictionary | ||
│ ├── fai | ||
| | └── reference.fa.fai # FASTA index file | ||
│ ├── masked | ||
| | └── reference.fa # Masked reference sequence | ||
│ └── Clade1.fasta | ||
├── Clade2 | ||
├── Clade3 | ||
├── Clade4 | ||
├── Clade5 | ||
└── GCA_016772135 # Default reference | ||
``` | ||
|
||
- **strain** optionally delineates the strain name for VCF gene name annotation. MycoSNP currently only annotates with respect to the default strain, "B11205", so changing this option will simply bypass VCF annotation. | ||
|
||
|
||
<div class="searchable-table" markdown="1"> | ||
|
||
| **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** | | ||
|---|---|---|---|---|---| | ||
| mycosnp_variants | **read1** | File | Illumina forward read file in FASTQ format (compression optional) | | Required | | ||
| mycosnp_variants | **read2** | File | Illumina reverse read file in FASTQ format (compression optional) | | Required | | ||
| mycosnp_variants | **samplename** | String | Name of sample to be analyzed | | Required | | ||
| mycosnp | **coverage** | Int | Coverage is used to calculate a down-sampling rate that results in the specified coverage. For example, if coverage is 70, then FASTQ files are down-sampled such that, when aligned to the reference, the result is approximately 70x coverage | 0 | Optional | | ||
| mycosnp | **cpu** | Int | Number of CPUs to allocate to the task | 8 | Optional | | ||
| mycosnp | **debug** | Boolean | If true, keeps `.nextflow/` and `work/` directories | false | Optional | | ||
| mycosnp | **disk_size** | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | | ||
| mycosnp | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/mycosnp:1.5" | Optional | | ||
| mycosnp | **memory** | Int | Amount of memory/RAM (in GB) to allocate to the task | 64 | Optional | | ||
| mycosnp | **min_depth** | Int | Min depth for a base to be called as the consensus sequence, otherwise it will be called as an N; set to 0 to disable | 10 | Optional | | ||
| mycosnp | **reference** | String | Reference clade | "GCA_016772135" | Optional | | ||
| mycosnp | **sample_ploidy** | Int | 1 | Ploidy of sample (GATK) | Optional | | ||
| mycosnp | **strain** | String | Reference strain | "B11205" | Optional | | ||
| mycosnp_variants | **ref_fasta** | File | Reference FASTA file | | Optional | | ||
| mycosnp_variants | **ref_tar** | File | Reference gzipped compressed tarchive | | Optional | | ||
| version_capture | **timezone** | String | Alternative timezone | | Optional | | ||
|
||
</div> | ||
|
||
#### Outputs | ||
|
||
<div class="searchable-table" markdown="1"> | ||
|
||
| **Variable** | **Type** | **Description** | | ||
|---|---|---| | ||
| analysis_date | String | Date of the analysis | | ||
| assembly_size | Int | Size of the assembly | | ||
| average_q_score_after_trimming | Float | Average quality score after trimming | | ||
| average_q_score_before_trimming | Float | Average quality score before trimming | | ||
| consensus_n_variant_min_depth | Int | Minimum depth for consensus N variant | | ||
| full_results | File | Full results file | | ||
| gc_after_trimming | Float | GC content after trimming | | ||
| gc_before_trimming | Float | GC content before trimming | | ||
| mean_coverage_depth | Float | Mean coverage depth | | ||
| multiqc | File | MultiQC report | | ||
| myco_bam | File | BAM file | | ||
| myco_bam_bai | File | BAM index file | | ||
| mycosnp_docker | String | Docker image used for MycoSNP | | ||
| mycosnp_variants_analysis_date | String | Date of the MycoSNP variants analysis | | ||
| mycosnp_variants_version | String | Version of the MycoSNP variants | | ||
| mycosnp_version | String | Version of MycoSNP | | ||
| number_n | Int | Number of N bases | | ||
| paired_reads_after_trimming | Int | Number of paired reads after trimming | | ||
| paired_reads_after_trimming_percent | String | Percentage of paired reads after trimming | | ||
| percent_reference_coverage | Float | Percentage of reference coverage | | ||
| reads_after_trimming | Int | Number of reads after trimming | | ||
| reads_after_trimming_percent | String | Percentage of reads after trimming | | ||
| reads_before_trimming | Int | Number of reads before trimming | | ||
| reads_mapped | Int | Number of reads mapped | | ||
| reference_length_coverage_after_trimming | Float | Reference length coverage after trimming | | ||
| reference_length_coverage_before_trimming | Float | Reference length coverage before trimming | | ||
| reference_name | String | Name of the reference genome used | | ||
| reference_strain | String | Reference strain used | | ||
| unpaired_reads_after_trimming | Int | Number of unpaired reads after trimming | | ||
| unpaired_reads_after_trimming_percent | String | Percentage of unpaired reads after trimming | | ||
| vcf | File | Compressed variant call format (VCF) file depicting SNPs | | ||
| vcf_index | File | Compressed index file for the VCF | | ||
|
||
</div> | ||
|
||
<br/> | ||
|
||
### wf_mycosnp_tree.wdl | ||
`mycosnp_tree` reconstructs an IQ-TREE SNP phylogenetic tree that incorporates representative genomes of Clade1-Clade5 *C. auris*. VCF data generated from [wf_mycosnp_variants.wdl](#wf_mycosnp_variantswdl) are used as inputs. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tree will fail with less than 4 samples so I think we should add this in. IQ tree wont run if less than 4 samples are in the file I saw in the log output |
||
NOTE: At least four samples, including reference, are required | ||
|
||
#### Inputs | ||
|
||
- **reference** optionally takes a presupplied reference clade directory delineated [here](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference). | ||
- **ref_fasta** optionally takes a reference FASTA (requires suffix `.fa`) that will be indexed via BWA and generate a reference directory. | ||
- **strain** is passed to output but does not change workflow function. | ||
|
||
<div class="searchable-table" markdown="1"> | ||
|
||
| **Terra Task Name** | **Variable** | **Type** | **Description** | **Default Value** | **Terra Status** | | ||
|---|---|---|---|---|---| | ||
| mycosnp_tree | **vcf** | Array[File] | VCF files (.vcf.gz) containing SNP data for phylogenetic analysis. These files can be generated from `wf_mycosnp_variants.wdl` | | Required | | ||
| mycosnp_tree | **vcf_index** | Array[File] | Index files for the VCF files | | Required | | ||
| mycosnp_tree | **ref_fasta** | File | Reference FASTA input | | Optional | | ||
| mycosnptree | **cpu** | Int | Number of CPUs to allocate to the task | 8 | Optional | | ||
| mycosnptree | **disk_size** | Int | Amount of storage (in GB) to allocate to the task | 100 | Optional | | ||
| mycosnptree | **docker** | String | The Docker container to use for the task | "us-docker.pkg.dev/general-theiagen/theiagen/mycosnp:1.5" | Optional | | ||
| mycosnptree | **memory** | Int | Amount of memory/RAM (in GB) to allocate to the task | 64 | Optional | | ||
| mycosnptree | **reference** | String | Preexisting [reference directory](https://github.com/theiagen/mycosnp-wdl/tree/main/data/reference) | "GCA_016772135" | Optional | | ||
| mycosnptree | **strain** | String | mycosnp-nf reference strain name | "B11205" | Optional | | ||
| version_capture | **timezone** | String | Alternative timezone | | Optional | | ||
|
||
</div> | ||
|
||
#### Outputs | ||
|
||
<div class="searchable-table" markdown="1"> | ||
|
||
| **Variable** | **Type** | **Description** | | ||
|---|---|---| | ||
| mycosnp_alignment | File | Concatenated SNP alignment file | | ||
| mycosnp_docker | String | Docker image used for MycoSNP | | ||
| mycosnp_fastree_tree | File | Phylogenetic tree inferred using FastTree (heuristic maximum likelihood) | | ||
| mycosnp_iqtree_tree | File | Phylogenetic tree inferred using IQ-TREE (high quality maximum likelihood) | | ||
| mycosnp_rapidnj_tree | File | Phylogenetic tree inferred using RapidNJ (neighbor-joining method) | | ||
| mycosnp_tree_analysis_date | String | Date of the analysis | | ||
| mycosnp_tree_full_results | File | Full results file | | ||
| mycosnp_tree_vcf_csv | File | SNP variants formatted as a CSV table | | ||
| mycosnp_tree_version | String | Version of the `mycosnp_tree` WDL workflow | | ||
| mycosnp_version | String | Version of MycoSNP | | ||
| mycosnptree_snpdists | File | SNP distances file | | ||
| reference_name | String | Name of the reference | | ||
| reference_strain | String | Reference strain used | | ||
|
||
</div> |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recommend,