From cceb828c0288f25fd33f85d10c33f12a2be61b4e Mon Sep 17 00:00:00 2001 From: Dan Fornika Date: Wed, 30 Nov 2022 10:25:14 -0800 Subject: [PATCH] Add section on --skip_bracken --- README.md | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index b32b7a3..af1283a 100644 --- a/README.md +++ b/README.md @@ -34,6 +34,8 @@ nextflow run BCCDC-PHL/taxon-abundance \ --outdir ``` +### Extracting reads by taxonomic ID + Reads can be binned by taxonomic group, and extracted to separate output files using the `--extract_reads` flag. When using this flag, a threshold is applied on the percentage of reads assigned to the taxonomic group, below which reads are not extracted. The default threshold is 1%. It can be modified using the `--extract_reads_threshold` flag. @@ -49,6 +51,24 @@ nextflow run BCCDC-PHL/taxon-abundance \ --outdir ``` +### Skipping Bracken + +By default, [bracken](https://github.com/jenniferlu717/Bracken) is used to re-estimate the read abundances for each taxonomic group, +at a specific taxonomic level (Genus, Species, etc.). + +If desired, bracken can be skipped with the `--skip_bracken` flag: + +``` +nextflow run BCCDC-PHL/taxon-abundance \ + --fastq_input \ + --skip_bracken \ + --outdir +``` + +When the `--skip_bracken` flag is used, abundances will be calculated directly from the kraken2 report. Note that the abundance +estimates directly from kraken2 reports may under-estimate the actual abundances. Detailed rationale for including bracken analysis +can be found in the [bracken paper](https://peerj.com/articles/cs-104/). + ## Outputs An output directory will be created for each sample. Within those directories, @@ -138,4 +158,4 @@ For each pipeline invocation, each sample will produce a `provenance.yml` file w - timestamp_analysis_start: 2021-11-25T16:53:10.549863 ``` -The filename of the provenance file includes a timestamp with format `YYYYMMDDHHMMSS` to ensure that re-analysis of the same sample will create a unique `provenance.yml` file. \ No newline at end of file +The filename of the provenance file includes a timestamp with format `YYYYMMDDHHMMSS` to ensure that re-analysis of the same sample will create a unique `provenance.yml` file.