Skip to content

Commit

Permalink
Documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
davidealbanese committed Apr 20, 2018
1 parent 0073801 commit 06e1879
Show file tree
Hide file tree
Showing 5 changed files with 30 additions and 48 deletions.
6 changes: 4 additions & 2 deletions doc/source/commands/mergepairs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ mergepairs
usage: micca mergepairs [-h] -i FILE [FILE ...] -o FILE [-r FILE]
[-l MINOVLEN] [-d MAXDIFFS] [-p PATTERN] [-e REPL]
[-s SEP] [-n] [--notmerged-fwd FILE]
[--notmerged-rev FILE]
[--notmerged-rev FILE] [-t THREADS]
micca mergepairs merges paired-end sequence reads into one sequence.
Expand Down Expand Up @@ -75,6 +75,8 @@ mergepairs
not included in the merged sequence.
--notmerged-fwd FILE write not merged forward reads.
--notmerged-rev FILE write not merged reverse reads.
-t THREADS, --threads THREADS
number of threads to use (1 to 256, default 1).
Examples
Expand All @@ -88,4 +90,4 @@ mergepairs
*_R2*.fastq):
micca mergepairs -i *_R1*.fastq -o merged.fastq --notmerged-fwd \
notmerged_fwd.fastq --notmerged-rev notmerged_rev.fastq
notmerged_fwd.fastq --notmerged-rev notmerged_rev.fastq
45 changes: 15 additions & 30 deletions doc/source/commands/otu.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
otu
===

See :doc:`/otu` for details.

.. code-block:: console
usage: micca otu [-h] -i FILE [-o DIR] [-r FILE]
Expand All @@ -11,7 +13,8 @@ otu
[--unoise-alpha UNOISE_ALPHA]
micca otu assigns similar sequences (marker genes such as 16S rRNA and
the fungal ITS region) to operational taxonomic units (OTUs).
the fungal ITS region) to operational taxonomic units (OTUs) or sequence
variants (SVs).
Trimming the sequences to a fixed position before clustering is
*strongly recommended* when they cover partial amplicons or if quality
Expand All @@ -23,24 +26,14 @@ otu
micca otu provides the following protocols:
* de novo greedy clustering (denovo_greedy): sequences are clustered
without relying on an external reference database, using an
approach similar to the UPARSE pipeline (doi: 10.1038/nmeth.2604):
i) dereplication; ii) OTU picking greedy clustering; iii) chimera
filtering (UCHIME, optional) on the OTU representatives; iv) map
sequences to the representatives.
* de novo greedy clustering (denovo_greedy): useful for for the
identification of 97% OTUs;
* de novo unoise (denovo_unoise): denoise Illumina sequences using
the UNOISE3 protocol: i) dereplication; ii) denoising; iii) chimera
filtering (UCHIME3, optional) on the ZOTUs (zero-radius OTUs) iv)
mapping sequences to ZOTUs.
the UNOISE3 protocol;
* de novo swarm (denovo_swarm): sequences are clustered without relying
on an external reference database, using swarm (doi:
10.7717/peerj.593, doi: 10.7717/peerj.1420,
https://github.com/torognes/swarm): i) predict sequence abundances of
each sequence by dereplication; ii) swarm clustering; iii) remove
chimeric sequences (de novo, optional) from the representatives.
* de novo swarm (denovo_swarm): a robust and fast clustering method
(deprecated, it will be removed in version 1.8.0);
* closed-reference clustering (closed_ref): sequences are clustered
against an external reference database and reads that could not be
Expand All @@ -57,21 +50,15 @@ otu
* otus.fasta: FASTA file containing the representative sequences (OTUs);
* otuids.txt: OTU ids to original sequence ids (tab-delimited text file)
* hits.txt: three-columns, TAB-separated file:
1. matching sequence
2. representative (seed)
3. identity (if available, else '*'), defined as:
* otuids.txt: OTU ids to original sequence ids (tab-delimited text
file);
matching columns
-------------------------------- ;
alignment length - terminal gaps
* hits.txt: three-columns, TAB-separated file with matching sequence,
representative (seed) and identity (if available, else '*');
* otuschim.fasta (only for 'denovo_greedy', 'denovo_swarm' and
'open_ref' when --rmchim is specified): FASTA file containing the
chimeric otus;
chimeric otus.
optional arguments:
-h, --help show this help message and exit
Expand Down Expand Up @@ -147,9 +134,7 @@ otu
micca otu -i input.fasta --method open_ref --threads 8 --id 0.97 \
--ref greengenes_2013_05/rep_set/97_otus.fasta
De novo swarm clustering with the protocol suggested by the authors
using 4 threads (see https://github.com/torognes/swarm and
https://github.com/torognes/swarm/wiki):
De novo swarm clustering with the protocol using 4 threads:
micca otu -i input.fasta --method denovo_swarm --threads 4 \
--swarm-fastidious --rmchim --minsize 1
4 changes: 2 additions & 2 deletions doc/source/pairedend_97.rst
Original file line number Diff line number Diff line change
Expand Up @@ -130,8 +130,8 @@ Quality filtering
-----------------

Producing high-quality OTUs requires high-quality reads. :doc:`/commands/filter`
filters sequences according to the maximum allowed expected error (EE) rate %
(see :doc:`/filtering`). We recommend values <=1%.
filters sequences according to the maximum allowed expected error (EE) rate %.
We recommend values <=1%.

For paired-end reads, we recommend to merge pairs first, then quality filter
using a maximum EE threshold with **no length truncation**.
Expand Down
4 changes: 2 additions & 2 deletions doc/source/singleend.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ will be included into the sequence indentifiers.
.. Note::

In the case of overlapping paired-end reads go to :doc:`/pairedend_97` or
:doc:`/denoiseing_illumina`.
:doc:`/denoising_illumina`.

.. _singleend-primer_trimming:

Expand Down Expand Up @@ -105,7 +105,7 @@ Quality filtering

Producing high-quality OTUs requires high-quality reads. The
:doc:`/commands/filter` command filters sequences according to the maximum
allowed expected error (EE) rate % (see :doc:`/filtering`). We recommend values
allowed expected error (EE) rate %. We recommend values
<=1%. Moreover, to obtain good results in clustering (see :doc:`/commands/otu`),
reads should be **truncated at the same length** when they cover partial
amplicons or if quality deteriorates towards the end (common when you have long
Expand Down
19 changes: 7 additions & 12 deletions micca/cmds/otu.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,8 @@ def main(argv):
* de novo unoise (denovo_unoise): denoise Illumina sequences using
the UNOISE3 protocol;
* de novo swarm (denovo_swarm): a robust and fast clustering method;
* de novo swarm (denovo_swarm): a robust and fast clustering method
(deprecated, it will be removed in version 1.8.0);
* closed-reference clustering (closed_ref): sequences are clustered
against an external reference database and reads that could not be
Expand All @@ -67,21 +68,15 @@ def main(argv):
* otus.fasta: FASTA file containing the representative sequences (OTUs);
* otuids.txt: OTU ids to original sequence ids (tab-delimited text file)
* otuids.txt: OTU ids to original sequence ids (tab-delimited text
file);
* hits.txt: three-columns, TAB-separated file:
1. matching sequence
2. representative (seed)
3. identity (if available, else '*'), defined as:
matching columns
-------------------------------- ;
alignment length - terminal gaps
* hits.txt: three-columns, TAB-separated file with matching sequence,
representative (seed) and identity (if available, else '*');
* otuschim.fasta (only for 'denovo_greedy', 'denovo_swarm' and
'open_ref' when --rmchim is specified): FASTA file containing the
chimeric otus;
chimeric otus.
''')

epilog = textwrap.dedent('''\
Expand Down

0 comments on commit 06e1879

Please sign in to comment.