Skip to content

Latest commit

 

History

History
26 lines (21 loc) · 2.45 KB

README.md

File metadata and controls

26 lines (21 loc) · 2.45 KB

Metagenomics Pipeline

A pipeline for metagenomic sequencing experiments. Takes paired-end .fastq.gz files as input and generates a detailed HTML report with result tables. After quality trimming and filtering of low-complexity sequences with fqtrim, high-quality reads are being aligned to the human reference genome using Bowtie 2. Unaligned (non-human) reads are then subjected to several metagenomics tools.

First, taxonomic classification with Kraken 2 and Centrifuge using different reference databases (standard, viral, EUPATHDB48) is being performed. Next, reads are aligned against ~12,000 RefSeq virus genomes (as well as ~4,500 human-infecting virus strains related to the RefSeq viruses) and detection of viral integration sites into the human genome are detected using Arriba and STAR. In a last step, de novo assembly using MEGAHIT is being performed. Contigs > 1000 bp are automatically classified with Kraken 2 and Centrifuge. Output files can be used for manual downstream analyses such as BLAST or phylogenetic studies. All results are summarized in a comprehensive HTML report.

The pipeline is adapted to the SLURM job scheduler for parallel processing of multiple samples. Requires 140 GB memory (Centrifuge index with all non-redunandt NCBI sequences) and adjustable number of CPUs.

Prerequisites

The following tools need to be installed and available in your $PATH:

Additionally, the following reference data is required:

HTML report

report