Brian High
February 24, 2015
- Programming Languages
- Development Environments
- Operating Environments
- Other software, tools, websites and databases
- Free-standing ("binary") applications and utilities
- Download from developer (or use package manager like brew)
- These may be graphical or command-line programs
- Scripts and packages
- System Administration issues
- You may need administrative ("superuser") rights to install
- You may need to move files or modify environment variables like
PATH
- You may need to use
git
,svn
, orhg
to pull from repositories
- Requirements
- Programs written in languages like C and C++ must be compiled before use
- If you can't download a "binary" of the program, you will have to compile
- Mac users will need a development environment like XCode
- Windows users may need a GNU environment like Cygwin or MinGW
- These include a compiler like GCC and automation tools like make
- A package manager like MacPorts can automate the process
- Compilation steps are usually run from a command-line "shell" like
Bash
- Usually these are listed in a README file (text, markdown, or HTML)
- Can be as simple as:
./configure
,make
andsudo make install
make
is a tool commonly used to automate compilation and installation./configure
prepares the Makefile andmake
processes it
- Tracking down and installing dependencies (libraries) may be tedious
- Compile, fix errors, re-compile, fix errors, re-compile, etc.
What software tools will you need to reproduce results from these papers?
- Leek, J. T. & Storey, J. D. Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis. PLoS Genet 3, e161 (2007).
- Risso, D., Ngai, J., Speed, T. P. & Dudoit, S. Normalization of RNA-seq data using factor analysis of control genes or samples. Nature Biotechnology 32, 896–902 (2014).
- Finak, G. et al. OpenCyto: an open source infrastructure for scalable, robust, reproducible, and automated, end-to-end flow cytometry data analysis. PLoS Comput. Biol. 10, e1003806 (2014).
- Zhong, Y. & Liu, Z. Gene expression deconvolution in linear space. Nat. Methods 9, 8–9– author reply 9 (2012).
- Anders, S., Reyes, A. & Huber, W. Detecting differential usage of exons from RNA-seq data. Genome Res. 22, 2008-2017 (2012).
biocLite(c("sva", "RUVSeq", "openCyto", "csSAM", "DEXSeq"))
What software tools will you need to reproduce results from this paper?
- McDavid, A. et al. Data exploration, quality control and testing in single-cell qPCR-based gene expression experiments. Bioinformatics 29, 461-467 (2013).
This uses an R package. Do we install it with install.packages
or biocLite
?
No, since the SingleCellAssay package is not yet in Bioconductor. Instead, the README recommends:
install.packages('devtools')
library(devtools)
install_github('SingleCellAssay', 'RGLab')
# *or* if you don't have a working latex setup
install_github('SingleCellAssay', 'RGLab', build_vignettes=FALSE)
vignette('SingleCellAssay-intro')
What software tools will you need to reproduce results from this paper?
- Frazee, A. C., Sabunciyan, S., Hansen, K. D., Irizarry, R. A. & Leek, J. T. Differential expression analysis of RNA-seq data at single-base resolution. Biostatistics kxt053 (2014). doi:10.1093/biostatistics/kxt053.
This study used a prototype version of an R Bioconductor package. Some sample analysis code was provided. Since the sample code uses some packages no longer available in Bioconductor, you will need to use version R 2.15.x (2.15.2 or above).
# Install "beta" derfinder and dependencies
source("http://bioconductor.org/biocLite.R")
biocLite()
biocLite(c("Genominator", "limma", "GenomicFeatures", "rtracklayer"))
install.packages(c("RSQLite.extfuns", "HiddenMarkov", "proto", "locfdr", "devtools"))
library(devtools)
install_github('derfinder', 'alyssafrazee') # beta version
library(derfinder)
Also, there are some "rda" files to be loaded with the sample code that may not be provided in the Github repo. Check for any open issues. Also, be sure to read the README, especially the "reproducing the manuscript's results" section. You will also need samtools. The entire process takes several hours and a few GB of RAM.
What software tools will you need to reproduce results from this paper?
- Hansen, K. D., Langmead, B. & Irizarry, R. A. BSmooth: from whole genome bisulfite sequencing reads to differentially methylated regions. Genome Biol. (2012).
This paper presents a "pipeline". How do we get it to work?
mkdir -p ~/src && cd ~/src/ && export BSMOOTH_HOME=~/src/bsmooth-align
git clone https://github.com/BenLangmead/bsmooth-align.git
cd $BSMOOTH_HOME/merman/ && make
This gives several compiler errors in merman.cpp
when compiled on
Bio-Linux 8 /
Ubuntu 14.04 LTS using GCC 4.8.2 and
also on OS X Mavericks (10.9) using XCode with GCC 4.2.1. This would compile
correctly, however, using an older version of GCC (4.1.2) on a Red Hat Linux
5.11 system. All test systems were 64 bit.
You will also need Bowtie2.
What software tools will you need to reproduce results from this paper?
- Amir, E.-A. D. et al. viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia. Nature Biotechnology 31, 545–552 (2013).
From within what environment do we use viSNE
? How do we access it?
viSNE runs within
cyt. cyt
requires:
- MatLab 2010b or higher on Windows or Mac OS X
- Parallel computing toolbox
For a fee, you can also run viSNE on CytoBank (a website).
The requirements are listed at the top of the article. How would we install them?
Regarding data for the example, Raphael posted three essential files in a Dropbox folder:
What are they? Where did they come from? Why not just get them from the source?
cd ./RSEM_test/Reference_Genome/
../../using_rsem_prep_input.sh
That script will download the three files from UCSC. A little extra processing is done to extract, convert, or rename the files.
The conversion of the GTF will not work on Windows, even using an environment like Cygwin, as some dependencies (namely, genePredToGtf
) will not be met. How else can you get that file?
Assuming you have already installed bowtie and bowtie2, you can run this shell script to compile and test Trinity and RSEM.
#!/bin/sh
# Test Trinity and RSEM
mkdir -p ~/biotools/
cd ~/biotools/
git clone 'https://github.com/bli25wisc/RSEM.git'
cd ./RSEM/
make
make ebseq
export PATH=$PATH:~/biotools/RSEM
cd ../
git clone 'https://github.com/trinityrnaseq/trinityrnaseq.git'
cd ./trinityrnaseq/
make clean
make
cd ./sample_data/test_Trinity_Assembly
./runMe.sh
You should see a lot of verbose output. Did the test run okay?
We have another example script which tests:
Detonate requires Blat.
The script assumes bowtie is already installed. The rest are downloaded and
compiled. In each case, the compile command is simply make
.
Read the script's comments to learn about other dependencies for compiling.
./detonate_test.sh
For a case study of using Rsubread, limma, and edgeR in a Bioconductor R pipeline to analyze RNA-seq data, see: rsubread_test.md, based on the work of:
Wei Shi (shi at wehi dot edu dot au), Yang Liao and Gordon K Smyth Bioinformatics Division, Walter and Eliza Hall Institute, Melbourne, Australia
Requirements:
- The version of Rsubread package should be 1.12.1 or later.
- You should run R version 3.0.2 or later.