From cceb828c0288f25fd33f85d10c33f12a2be61b4e Mon Sep 17 00:00:00 2001
From: Dan Fornika <dfornika@gmail.com>
Date: Wed, 30 Nov 2022 10:25:14 -0800
Subject: [PATCH] Add section on --skip_bracken

---
 README.md | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/README.md b/README.md
index b32b7a3..af1283a 100644
--- a/README.md
+++ b/README.md
@@ -34,6 +34,8 @@ nextflow run BCCDC-PHL/taxon-abundance \
   --outdir </path/to/outdir> 
 ```
 
+### Extracting reads by taxonomic ID
+
 Reads can be binned by taxonomic group, and extracted to separate output files using the `--extract_reads` flag.
 When using this flag, a threshold is applied on the percentage of reads assigned to the taxonomic group, below which
 reads are not extracted. The default threshold is 1%. It can be modified using the `--extract_reads_threshold` flag.
@@ -49,6 +51,24 @@ nextflow run BCCDC-PHL/taxon-abundance \
   --outdir </path/to/outdir> 
 ```
 
+### Skipping Bracken
+
+By default, [bracken](https://github.com/jenniferlu717/Bracken) is used to re-estimate the read abundances for each taxonomic group,
+at a specific taxonomic level (Genus, Species, etc.).
+
+If desired, bracken can be skipped with the `--skip_bracken` flag:
+
+```
+nextflow run BCCDC-PHL/taxon-abundance \
+  --fastq_input <fastq_input_dir> \
+  --skip_bracken \
+  --outdir </path/to/outdir> 
+```
+
+When the `--skip_bracken` flag is used, abundances will be calculated directly from the kraken2 report. Note that the abundance
+estimates directly from kraken2 reports may under-estimate the actual abundances. Detailed rationale for including bracken analysis
+can be found in the [bracken paper](https://peerj.com/articles/cs-104/).
+
 ## Outputs
 
 An output directory will be created for each sample. Within those directories,
@@ -138,4 +158,4 @@ For each pipeline invocation, each sample will produce a `provenance.yml` file w
 - timestamp_analysis_start: 2021-11-25T16:53:10.549863
 ```
 
-The filename of the provenance file includes a timestamp with format `YYYYMMDDHHMMSS` to ensure that re-analysis of the same sample will create a unique `provenance.yml` file.
\ No newline at end of file
+The filename of the provenance file includes a timestamp with format `YYYYMMDDHHMMSS` to ensure that re-analysis of the same sample will create a unique `provenance.yml` file.