This pipeline automates the process of performing a reference based assembly on Oxford Nanopore (ONT) sequencing data. It also includes parameters to support the processing of amplicon sequencing data on ONT.
Medaka requires information about the pore, sequencing device, and basecaller. This information is specified to the tool through a 'model', which is a string of text in one of the following formats:
For flowcell versions < 10.4
{pore}_{device}_{basecaller accuracy}_{basecaller version} #Example: r941_min_hac_g507
or for flowcell versions >= 10.4
{pore}_{motor protein version}_{pore speed}_{basecaller accuracy}_{basecaller version} #Example: r1041_e82_400bps_hac_g615
The basecaller setting can contain one of three values based on the accuracy setting of the base caller:
- fast - Fast basecalling
- hac - High-accuracy basecalling
- sup - Super-accurate basecalling
As well, we have seen better variant calling accuracy by adding the variant value to the medaka model between the basecaller accuracy and basecaller version values (Example: r941_min_hac_variant_g507)
Finally, the basecaller version should be the highest listed by medaka that is less than or equal to the version of guppy used to base-call the data (Example: if guppy 6.0 was used, g507 should be used).
To see the complete list of medaka models used, enter the command medaka tools list_models
To install this pipeline, enter the following commands:
# Clone the repository
git clone https://github.com/rchapman2000/ont-ref-based-assembly.git
# Create a conda environment using the provided environment.yml file
conda env create -f environment.yml
# Activate the conda environment
conda activate ONT-RefBasedAssembly
If you already have the pipeline installed, you can update it using the following commands:
# Navigate to your installation directory
cd ont-ref-based-assembly
# Use git to pull the latest update
git pull
# Activate the conda environment and use the environment.yml file to download updates
conda activate ONT-RefBasedAssembly
conda env update --file environment.yml --prune
To run the pipeline, use the following command:
# You must either be in the same directory as the main.nf file or reference the file location.
nextflow run main.nf [OPTIONS] --input INPUT_DIR --output OUTPUT_DIR --reference --model MEDAKA_MODEL
The pipeline also supports the following optional arguments:
Option | Type | Description |
---|---|---|
--trim | None | If supplied, the pipeline will trim ONT adapters/barcodes from reads using Porechop. [Default = off] |
--minReadLen | Int | If supplied, the pipeline will perform length filtering using Nanofilt excluding reads shorter than this value. [Default = off] |
--maxReadLen | Int | If supplied, the pipeline will perform length filtering using Nanofilt excluding reads larger than this value. [Default = off] |
--noSecondaryAlignments | None | If supplied, the pipeline will filter secondary alignments. Secondary alignments are defined as alternative alignments of the reads besides the primary alignment. [Default = off] |
--noSupplementaryAlignments | None | If supplied, the pipeline will filter supplementary read alignments. Supplementary alignments are defined as those where a read is not mapped in a contiguous fashion (i.e. multiple separate alignments). [Default = off] |
--xgen | File | The pipeline will take a masterfile provided by IDT containing primer coordinates and perform primer clipping using the PrimerClip tool provided by IDT. (Cannot be used with the --primers option)[Default = disabled] |
--primers | File | The pipeline will take a bed file containing coordinates and perform primer clipping using Samtools. the start and end regions of reads that fall within these coordinates will be soft-clipped from the alignment. (Cannot be used with the --xgen option) [Default = disabled] |
--removeUnclipped | None | If supplied (and a primer file supplied), the pipeline will remove unclipped reads from the alignment. (Requires --primers option and not compatible with the --xgen option) [Default = off] |
--minClippedReadLen | Int | If supplied (and a primer file supplied), the pipeline will remove clipped reads shorter than this length from the alignment. (Requires --primers option and not compatible with the --xgen option) [Default = off] |
--minCov | Int | The minimum depth of coverage required for a position to be called. Sites with a depth less than this value will be reported as "N" [Default = 20] |
To view the list of options from the command line, use the following command:
nextflow run main.nf --help