Skip to content

marbl/anianns

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About

Ani Ann's: Ani augmented Annotation of satellite arrays.

Ani Ann's is an a priori satellite detection and annotation software. Ani Ann's uses a matrix of Average Nucleotide Identity (Ani) values created by ModDotPlot to infer the location and type of satellites.

Note that Ani Ann's is currently under active development, and not all features are currently available.

Installation

git clone https://github.com/marbl/ModDotPlot.git
cd ModDotPlot

Although optional, setting up a virtual environment is recommended:

python -m venv venv
source venv/bin/activate

Once activated, you can install the required dependencies:

python -m pip install .

Usage

Currently, use of Ani Ann's is limited to detecting satellites and masking these regions. Classification of satellites, detection of Higher Order Repeats, and other metrics are not yet included.

Ani Ann's can be run with python src/anianns/anianns.py, or simply with the shortcut annotate:

annotate -h

Ani Ann's: Ani augmented Annotation of satellite arrays

options:
  -h, --help            show this help message and exit
  -f FASTA [FASTA ...], --fasta FASTA [FASTA ...]
                        Path to input fasta file(s).
  -b BAND, --band BAND  Max height in Mbp of band. (default: 8.0)
  -k KMER, --kmer KMER  k-mer length (default: 21)
  --overlap OVERLAP     Percent overlap. Must be < 0.5. (default: 0.1)
  --identity IDENTITY   Identity threshold. (default: 86)
  -o OUTPUT_DIR, --output-dir OUTPUT_DIR
                        Directory name for saving matrices and coordinate logs. Defaults to working directory. (default: None)
  -w WINDOW, --window WINDOW
                        Window size of ModDotPlot. (default: 2000)
  -m, --mask            Create a masked fasta file. (default: False)

To both run repeat masker and annotation tools, simply run annotate -m/--mask.

-f / --fasta <file>

Fasta files to input. Multifasta files are accepted.

-k / --kmer <int>

K-mer size to use. This should be large enough to distinguish unique k-mers with enough specificity, but not too large that sensitivity is removed. Default: 21.

-o / --output-dir <string>

Name of output directory. Default is current working directory.

-id / --identity <int>

Minimum sequence identity cutoff threshold when running ModDotPlot. Default is 86. While it is possible to go as low as 50% sequence identity, anything below 80% is not recommended.

-w / --window <int>

Dotplot window size, or the number of bp contained within each pixel in a plot. This is proportional to the sensitivity of satellite detection (ie. lower is more accurate, at the expense of runtime). Default is 2000.

-b / --band <int * 1000000>

When creating dotplots, to save time, multiple plots of a certain size are used instead of the entire seqeunce length. This can be adjusted here (default: 8, or 8Mbp). Increasing this will improve detection of off-target satellites, at the expense of runtime.

--bed <str>

Name of bed file to output to. Default is anianns-output.bed.

Questions

For bug reports or general usage questions, please raise a GitHub issue, or email alex dot sweeten at nih dot gov

About

ANI based satellite annotation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published