Skip to content
gk-zhang edited this page Sep 1, 2017 · 11 revisions

MicrobDetect

MicrobDetect is a pipeline to identify and to profile the microbiome content from the whole genome shotgun sequencing data of the clinical samples. The clinical samples, e.g. the biopsy samples from the human stomach mucosal, might contain mostly human DNA and very low abundance of microbial DNA. The generic tools developed for the metagenomic whole shortgun sequencing data profiling tools may not be suitable for those clinical microbial analysis. We therefore developed "MicroDetect" pipeline. Applications

MicroDetect is developed for the analysis of:

  • detect possible existed pathongens in the clinical samples.
  • generate microbial profile from the whole genome sequencing data.
  • estimate the abudance of microbiome at different taxonomy levels, from strains to phylum.
  • multiple testing on the paired samples combining metadata revealing the potential outcome related microbes.

How to run

a) Download and install MicrobDetect:

$ git clone https://github.com/gk-zhang/MicrobDetect.git

MicrobDetect runs under Ubuntu/Linux and requires the following software tools to be installed on your system:

  • python
  • bowtie2
  • R

b) Download clinical whole genome sequencing samples:

Here we take 4 samples from the Human Microbiome Project are of subjects ID_763496533 and ID_763577454 at 2 different times, as PanPhlAn has done, as an example:

sampleID_subjectID

SRS013951_763496533 SRS019161_763496533

SRS014459_763577454 SRS015065_763577454

Download them using wget:

wget -P samples https://www.dropbox.com/sh/5vxicumnz3adiwi/AACyBh4CUJFJ1P6a-ypzHvlua/SRS013951.tar.bz2

wget -P samples https://www.dropbox.com/sh/5vxicumnz3adiwi/AAA4CVLuQVzg9-9ql4pAQxL0a/SRS014459.tar.bz2

wget -P samples https://www.dropbox.com/sh/5vxicumnz3adiwi/AABslonWx4_V7L9ugEbSN3XFa/SRS015065.tar.bz2

wget -P samples https://www.dropbox.com/sh/5vxicumnz3adiwi/AACanDqRyhaX0G6Dtf4NPTVwa/SRS019161.tar.bz2

c) filter out the host reads

This step is optional but essential in some cases, e.g. if the samples are clinical samples where the host human DNA dominates.

d) prepare the microbial/pathogen reference sequence from Patric database

e) map samples against reference database

d) evaluate and estimate the microbial abundance

f) profiling the microbial abundance for all the samples

g) downstream statistical analysis combining clinical data