January 2015
The document describes how to run a demo showing a genome sequence matching in the cloud.
For the demo we use Digital Ocean public cloud (www.digitalocean.com). We use Digital Ocean because of its simplicity. Any cloud can be used with a minor adjustment of the cloud setup procedure.
First download the source code:
git clone https://github.com/cloudozer/BWT.git
Then create a directory for the files containing genome data:
mkdir bwt_files
Demo uses the next files which should be located in bwt_files directory:
human_g1k_v37_decoy.fasta
human_g1k_v37_decoy.fasta.index
SRR770176_1.fastq
Run
./rebar get-deps compile
to compile the source code.
To set up a cluster we need to run a script:
This script starts N small virtual servers. Every server has three files containing reference genome, index, and query sequences. After booting a node an erlang application (worker) starts. A master node sends a schedule to each node pointing which parts of genome is assigned to a particular worker.
Run
./start.sh
The script starts the sequence matching process. The script uses all available nodes belonging to the cluster.When a worker finds a match it sends a message to the master node indicating the query sequence name, position in the reference genome, and the output generated by smith-waterman algo.
To run the demo on the laptop run the next script:
./start_local.sh
The process of sequence matching will finishes when all query sequences are matched against reference sequence. However, the process can be terminated earlier by pressing Cntr-C, which terminates all workers. The cluster will be shutdown automatically if it is idle during 60 minutes.