Skip to content

Latest commit

 

History

History
267 lines (196 loc) · 12.5 KB

Installing_Bioinformatics_Tools.md

File metadata and controls

267 lines (196 loc) · 12.5 KB

Bioinformatics Software

Brian High
February 24, 2015

Languages, Environments and Tools

Installing Software

  • Free-standing ("binary") applications and utilities
    • Download from developer (or use package manager like brew)
    • These may be graphical or command-line programs
  • Scripts and packages
    • First install the language interpreter or environment
    • Install additional language modules, packages, or libraries needed
    • Package managers (biocLite, pip, cpan, etc.) may install dependencies for you
    • You often install and run these from a command-line "shell" like Bash
  • System Administration issues
    • You may need administrative ("superuser") rights to install
    • You may need to move files or modify environment variables like PATH
    • You may need to use git, svn, or hg to pull from repositories

Compiling Software

  • Requirements
    • Programs written in languages like C and C++ must be compiled before use
    • If you can't download a "binary" of the program, you will have to compile
    • Mac users will need a development environment like XCode
    • Windows users may need a GNU environment like Cygwin or MinGW
    • These include a compiler like GCC and automation tools like make
    • A package manager like MacPorts can automate the process
  • Compilation steps are usually run from a command-line "shell" like Bash
    • Usually these are listed in a README file (text, markdown, or HTML)
    • Can be as simple as: ./configure, make and sudo make install
  • make is a tool commonly used to automate compilation and installation
    • ./configure prepares the Makefile and make processes it
  • Tracking down and installing dependencies (libraries) may be tedious
    • Compile, fix errors, re-compile, fix errors, re-compile, etc.

Examples from Research Papers: #1

What software tools will you need to reproduce results from these papers?

  1. Leek, J. T. & Storey, J. D. Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis. PLoS Genet 3, e161 (2007).
  2. Risso, D., Ngai, J., Speed, T. P. & Dudoit, S. Normalization of RNA-seq data using factor analysis of control genes or samples. Nature Biotechnology 32, 896–902 (2014).
  3. Finak, G. et al. OpenCyto: an open source infrastructure for scalable, robust, reproducible, and automated, end-to-end flow cytometry data analysis. PLoS Comput. Biol. 10, e1003806 (2014).
  4. Zhong, Y. & Liu, Z. Gene expression deconvolution in linear space. Nat. Methods 9, 8–9– author reply 9 (2012).
  5. Anders, S., Reyes, A. & Huber, W. Detecting differential usage of exons from RNA-seq data. Genome Res. 22, 2008-2017 (2012).
biocLite(c("sva", "RUVSeq", "openCyto", "csSAM", "DEXSeq"))

Examples from Research Papers: #2

What software tools will you need to reproduce results from this paper?

This uses an R package. Do we install it with install.packages or biocLite?

No, since the SingleCellAssay package is not yet in Bioconductor. Instead, the README recommends:

install.packages('devtools')
 library(devtools)
 install_github('SingleCellAssay', 'RGLab')
# *or* if you don't have a working latex setup
 install_github('SingleCellAssay', 'RGLab', build_vignettes=FALSE)
 vignette('SingleCellAssay-intro')

Examples from Research Papers: #3

What software tools will you need to reproduce results from this paper?

This study used a prototype version of an R Bioconductor package. Some sample analysis code was provided. Since the sample code uses some packages no longer available in Bioconductor, you will need to use version R 2.15.x (2.15.2 or above).

# Install "beta" derfinder and dependencies
source("http://bioconductor.org/biocLite.R")
biocLite()
biocLite(c("Genominator", "limma", "GenomicFeatures", "rtracklayer"))
install.packages(c("RSQLite.extfuns", "HiddenMarkov", "proto", "locfdr", "devtools"))
library(devtools)
install_github('derfinder', 'alyssafrazee') # beta version
library(derfinder)

Also, there are some "rda" files to be loaded with the sample code that may not be provided in the Github repo. Check for any open issues. Also, be sure to read the README, especially the "reproducing the manuscript's results" section. You will also need samtools. The entire process takes several hours and a few GB of RAM.

Examples from Research Papers: #4

What software tools will you need to reproduce results from this paper?

This paper presents a "pipeline". How do we get it to work?

mkdir -p ~/src && cd ~/src/ && export BSMOOTH_HOME=~/src/bsmooth-align
git clone https://github.com/BenLangmead/bsmooth-align.git
cd $BSMOOTH_HOME/merman/ && make

This gives several compiler errors in merman.cpp when compiled on Bio-Linux 8 / Ubuntu 14.04 LTS using GCC 4.8.2 and also on OS X Mavericks (10.9) using XCode with GCC 4.2.1. This would compile correctly, however, using an older version of GCC (4.1.2) on a Red Hat Linux 5.11 system. All test systems were 64 bit.

You will also need Bowtie2.

Examples from Research Papers: #5

What software tools will you need to reproduce results from this paper?

From within what environment do we use viSNE? How do we access it?

viSNE runs within cyt. cyt requires:

For a fee, you can also run viSNE on CytoBank (a website).

What about our RSEM example?

The requirements are listed at the top of the article. How would we install them?

What about our RSEM example?

Regarding data for the example, Raphael posted three essential files in a Dropbox folder:

What are they? Where did they come from? Why not just get them from the source?

cd ./RSEM_test/Reference_Genome/
../../using_rsem_prep_input.sh

That script will download the three files from UCSC. A little extra processing is done to extract, convert, or rename the files.

The conversion of the GTF will not work on Windows, even using an environment like Cygwin, as some dependencies (namely, genePredToGtf) will not be met. How else can you get that file?

Example: Trinity and RSEM Test

Assuming you have already installed bowtie and bowtie2, you can run this shell script to compile and test Trinity and RSEM.

#!/bin/sh

# Test Trinity and RSEM
mkdir -p ~/biotools/
cd ~/biotools/
git clone 'https://github.com/bli25wisc/RSEM.git'
cd ./RSEM/
make
make ebseq
export PATH=$PATH:~/biotools/RSEM
cd ../
git clone 'https://github.com/trinityrnaseq/trinityrnaseq.git'
cd ./trinityrnaseq/
make clean
make
cd ./sample_data/test_Trinity_Assembly
./runMe.sh

You should see a lot of verbose output. Did the test run okay?

Example: Bowtie, RSEM, and Detonate

We have another example script which tests:

Detonate requires Blat.

The script assumes bowtie is already installed. The rest are downloaded and compiled. In each case, the compile command is simply make.

Read the script's comments to learn about other dependencies for compiling.

./detonate_test.sh

Example: Rsubread, limma, and edgeR

For a case study of using Rsubread, limma, and edgeR in a Bioconductor R pipeline to analyze RNA-seq data, see: rsubread_test.md, based on the work of:

Wei Shi (shi at wehi dot edu dot au), Yang Liao and Gordon K Smyth Bioinformatics Division, Walter and Eliza Hall Institute, Melbourne, Australia

Requirements:

  • The version of Rsubread package should be 1.12.1 or later.
  • You should run R version 3.0.2 or later.