Documentation

compmetagen · Apr 20, 2018 · 06e1879 · 06e1879
1 parent 0073801
commit 06e1879
Show file tree

Hide file tree

Showing 5 changed files with 30 additions and 48 deletions.
diff --git a/doc/source/commands/mergepairs.rst b/doc/source/commands/mergepairs.rst
@@ -6,7 +6,7 @@ mergepairs
     usage: micca mergepairs [-h] -i FILE [FILE ...] -o FILE [-r FILE]
                             [-l MINOVLEN] [-d MAXDIFFS] [-p PATTERN] [-e REPL]
                             [-s SEP] [-n] [--notmerged-fwd FILE]
-                            [--notmerged-rev FILE]
+                            [--notmerged-rev FILE] [-t THREADS]
 
     micca mergepairs merges paired-end sequence reads into one sequence.
 
@@ -75,6 +75,8 @@ mergepairs
                             not included in the merged sequence.
     --notmerged-fwd FILE  write not merged forward reads.
     --notmerged-rev FILE  write not merged reverse reads.
+    -t THREADS, --threads THREADS
+                            number of threads to use (1 to 256, default 1).
 
     Examples
 
@@ -88,4 +90,4 @@ mergepairs
     *_R2*.fastq):
 
         micca mergepairs -i *_R1*.fastq -o merged.fastq --notmerged-fwd \
-        notmerged_fwd.fastq --notmerged-rev notmerged_rev.fastq
+        notmerged_fwd.fastq --notmerged-rev notmerged_rev.fastq
diff --git a/doc/source/commands/otu.rst b/doc/source/commands/otu.rst
@@ -1,6 +1,8 @@
 otu
 ===
 
+See :doc:`/otu` for details.
+
 .. code-block:: console
 
     usage: micca otu [-h] -i FILE [-o DIR] [-r FILE]
@@ -11,7 +13,8 @@ otu
                     [--unoise-alpha UNOISE_ALPHA]
 
     micca otu assigns similar sequences (marker genes such as 16S rRNA and
-    the fungal ITS region) to operational taxonomic units (OTUs).
+    the fungal ITS region) to operational taxonomic units (OTUs) or sequence 
+    variants (SVs).
 
     Trimming the sequences to a fixed position before clustering is
     *strongly recommended* when they cover partial amplicons or if quality
@@ -23,24 +26,14 @@ otu
 
     micca otu provides the following protocols:
 
-    * de novo greedy clustering (denovo_greedy): sequences are clustered
-    without relying on an external reference database, using an
-    approach similar to the UPARSE pipeline (doi: 10.1038/nmeth.2604):
-    i) dereplication; ii) OTU picking greedy clustering; iii) chimera
-    filtering (UCHIME, optional) on the OTU representatives; iv) map
-    sequences to the representatives.
+    * de novo greedy clustering (denovo_greedy): useful for for the 
+    identification of 97% OTUs; 
 
     * de novo unoise (denovo_unoise): denoise Illumina sequences using
-    the UNOISE3 protocol: i) dereplication; ii) denoising; iii) chimera
-    filtering (UCHIME3, optional) on the ZOTUs (zero-radius OTUs) iv)
-    mapping sequences to ZOTUs.
+    the UNOISE3 protocol;
 
-    * de novo swarm (denovo_swarm): sequences are clustered without relying
-    on an external reference database, using swarm (doi:
-    10.7717/peerj.593, doi: 10.7717/peerj.1420,
-    https://github.com/torognes/swarm): i) predict sequence abundances of
-    each sequence by dereplication; ii) swarm clustering; iii) remove
-    chimeric sequences (de novo, optional) from the representatives.
+    * de novo swarm (denovo_swarm): a robust and fast clustering method 
+    (deprecated, it will be removed in version 1.8.0);
 
     * closed-reference clustering (closed_ref): sequences are clustered
     against an external reference database and reads that could not be
@@ -57,21 +50,15 @@ otu
 
     * otus.fasta: FASTA file containing the representative sequences (OTUs);
 
-    * otuids.txt: OTU ids to original sequence ids (tab-delimited text file)
-
-    * hits.txt: three-columns, TAB-separated file:
-
-    1. matching sequence
-    2. representative (seed)
-    3. identity (if available, else '*'), defined as:
+    * otuids.txt: OTU ids to original sequence ids (tab-delimited text
+    file);
 
-                matching columns
-        -------------------------------- ;
-        alignment length - terminal gaps
+    * hits.txt: three-columns, TAB-separated file with matching sequence,
+    representative (seed) and identity (if available, else '*');
 
     * otuschim.fasta (only for 'denovo_greedy', 'denovo_swarm' and
     'open_ref' when --rmchim is specified): FASTA file containing the
-    chimeric otus;
+    chimeric otus.
 
     optional arguments:
     -h, --help            show this help message and exit
@@ -147,9 +134,7 @@ otu
         micca otu -i input.fasta --method open_ref --threads 8 --id 0.97 \
         --ref greengenes_2013_05/rep_set/97_otus.fasta
 
-    De novo swarm clustering with the protocol suggested by the authors
-    using 4 threads (see https://github.com/torognes/swarm and
-    https://github.com/torognes/swarm/wiki):
+    De novo swarm clustering with the protocol using 4 threads:
 
         micca otu -i input.fasta --method denovo_swarm --threads 4 \
         --swarm-fastidious --rmchim --minsize 1
diff --git a/doc/source/pairedend_97.rst b/doc/source/pairedend_97.rst
@@ -130,8 +130,8 @@ Quality filtering
 -----------------
 
 Producing high-quality OTUs requires high-quality reads. :doc:`/commands/filter`
-filters sequences according to the maximum allowed expected error (EE) rate %
-(see :doc:`/filtering`). We recommend values <=1%.
+filters sequences according to the maximum allowed expected error (EE) rate %.
+We recommend values <=1%.
 
 For paired-end reads, we recommend to merge pairs first, then quality filter
 using a maximum EE threshold with **no length truncation**.

diff --git a/doc/source/singleend.rst b/doc/source/singleend.rst
@@ -58,7 +58,7 @@ will be included into the sequence indentifiers.
 .. Note::
 
    In the case of overlapping paired-end reads go to :doc:`/pairedend_97` or 
-   :doc:`/denoiseing_illumina`.
+   :doc:`/denoising_illumina`.
 
 .. _singleend-primer_trimming:
 
@@ -105,7 +105,7 @@ Quality filtering
 
 Producing high-quality OTUs requires high-quality reads. The
 :doc:`/commands/filter` command filters sequences according to the maximum
-allowed expected error (EE) rate % (see :doc:`/filtering`). We recommend values
+allowed expected error (EE) rate %. We recommend values
 <=1%. Moreover, to obtain good results in clustering (see :doc:`/commands/otu`),
 reads should be **truncated at the same length** when they cover partial
 amplicons or if quality deteriorates towards the end (common when you have long

diff --git a/micca/cmds/otu.py b/micca/cmds/otu.py
@@ -50,7 +50,8 @@ def main(argv):
         * de novo unoise (denovo_unoise): denoise Illumina sequences using
           the UNOISE3 protocol;
 
-        * de novo swarm (denovo_swarm): a robust and fast clustering method;
+        * de novo swarm (denovo_swarm): a robust and fast clustering method 
+          (deprecated, it will be removed in version 1.8.0);
 
         * closed-reference clustering (closed_ref): sequences are clustered
           against an external reference database and reads that could not be
@@ -67,21 +68,15 @@ def main(argv):
 
         * otus.fasta: FASTA file containing the representative sequences (OTUs);
 
-        * otuids.txt: OTU ids to original sequence ids (tab-delimited text file)
+        * otuids.txt: OTU ids to original sequence ids (tab-delimited text
+          file);
 
-        * hits.txt: three-columns, TAB-separated file:
-
-          1. matching sequence
-          2. representative (seed)
-          3. identity (if available, else '*'), defined as:
-
-                      matching columns
-             -------------------------------- ;
-             alignment length - terminal gaps
+        * hits.txt: three-columns, TAB-separated file with matching sequence,
+          representative (seed) and identity (if available, else '*');
 
         * otuschim.fasta (only for 'denovo_greedy', 'denovo_swarm' and
           'open_ref' when --rmchim is specified): FASTA file containing the
-          chimeric otus;
+          chimeric otus.
     ''')
 
     epilog = textwrap.dedent('''\