Skip to content

Latest commit

 

History

History
66 lines (31 loc) · 1.9 KB

demo_desc.md

File metadata and controls

66 lines (31 loc) · 1.9 KB

Genome sequence matching demo

January 2015

Introduction

The document describes how to run a demo showing a genome sequence matching in the cloud.

Public cloud

For the demo we use Digital Ocean public cloud (www.digitalocean.com). We use Digital Ocean because of its simplicity. Any cloud can be used with a minor adjustment of the cloud setup procedure.

Compilation

First download the source code:

git clone https://github.com/cloudozer/BWT.git

Then create a directory for the files containing genome data:

mkdir bwt_files

Demo uses the next files which should be located in bwt_files directory:

human_g1k_v37_decoy.fasta
human_g1k_v37_decoy.fasta.index
SRR770176_1.fastq

Run

./rebar get-deps compile 

to compile the source code.

Cluster setup

To set up a cluster we need to run a script:

This script starts N small virtual servers. Every server has three files containing reference genome, index, and query sequences. After booting a node an erlang application (worker) starts. A master node sends a schedule to each node pointing which parts of genome is assigned to a particular worker.

Starting genome sequence matching in the cloud

Run

./start.sh

The script starts the sequence matching process. The script uses all available nodes belonging to the cluster.When a worker finds a match it sends a message to the master node indicating the query sequence name, position in the reference genome, and the output generated by smith-waterman algo.

Starting genome sequence matching locally

To run the demo on the laptop run the next script:

./start_local.sh

Terminating sequence matching

The process of sequence matching will finishes when all query sequences are matched against reference sequence. However, the process can be terminated earlier by pressing Cntr-C, which terminates all workers. The cluster will be shutdown automatically if it is idle during 60 minutes.