Skip to content

Commit

Permalink
Merge pull request #9 from GeneDx/add-deletions
Browse files Browse the repository at this point in the history
Add deletions
  • Loading branch information
rebecca810 authored Sep 16, 2020
2 parents 2dd9126 + f85f721 commit 0d4bb69
Show file tree
Hide file tree
Showing 22 changed files with 666 additions and 192 deletions.
53 changes: 27 additions & 26 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,43 +1,44 @@
# source Image
FROM ubuntu:18.04
FROM ubuntu:20.04

# set noninterative mode
ENV DEBIAN_FRONTEND noninteractive

# apt-get update and install global requirements
RUN apt-get clean all && \
apt-get update && \
apt-get upgrade -y && \
apt-get install -y \
build-essential \
libbz2-dev \
libcurl4-openssl-dev \
liblzma-dev \
libncurses5-dev \
libnss-sss \
libssl-dev \
r-base \
zlib1g-dev
apt-get update && \
apt-get upgrade -y && \
apt-get install -y \
autoconf \
autogen \
build-essential \
curl \
libbz2-dev \
libcurl4-openssl-dev \
libhts3 \
libhts-dev \
liblzma-dev \
libncurses5-dev \
libnss-sss \
libssl-dev \
libxml2-dev \
ncbi-blast+ \
r-base \
r-bioc-biostrings \
r-bioc-rsamtools \
r-cran-biocmanager \
r-cran-devtools \
r-cran-stringr \
r-cran-optparse \
zlib1g-dev

# apt-get clean and remove cached source lists
RUN apt-get clean && \
rm -rf /var/lib/apt/lists/*

# install global r requirements
RUN echo "r <- getOption('repos'); r['CRAN'] <- 'http://cran.us.r-project.org'; options(repos = r);" > ~/.Rprofile
RUN Rscript -e "install.packages('optparse')"
RUN Rscript -e "install.packages('stringr')"
RUN Rscript -e "source('https://bioconductor.org/biocLite.R'); biocLite('Biostrings')"

# install htslib
RUN curl -LO https://github.com/samtools/htslib/releases/download/1.10.2/htslib-1.10.2.tar.bz2 && \
tar xfj htslib-1.10.2.tar.bz2 && \
cd htslib-1.10.2 && \
autoheader && \
autoconf && \
./configure && \
make && \
make install
RUN Rscript -e "library(devtools); install_github('mhahsler/rBLAST')"

# install scramble
COPY . /app
Expand Down
42 changes: 0 additions & 42 deletions Dockerfile.centos

This file was deleted.

108 changes: 62 additions & 46 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,54 +12,42 @@ clusters. For how to build see the build section.
Build
-----

Install dependencies (Debian/Ubuntu):
Install dependencies (Ubuntu 20.04):

$ apt-get update
$ apt-get install -y \
apt-get update
apt-get install -y \
autoconf \
autogen \
build-essential \
curl \
libbz2-dev \
libcurl4-openssl-dev \
libhts-dev \
liblzma-dev \
libncurses5-dev \
libnss-sss \
libssl-dev \
libhts-dev \
libxml2-dev \
ncbi-blast+ \
r-base \
r-bioc-biostrings \
r-bioc-rsamtools \
r-cran-biocmanager \
r-cran-devtools \
r-cran-stringr \
r-cran-optparse \
zlib1g-dev

Install dependencies (Centos):

$ yum install -y \
epel-release && \
$ yum install -y \
autoconf \
bzip2 \
bzip2-devel \
libcurl-devel \
libcrypto-devel \
openssl-devel \
R \
xz-devel \
zlib-devel

Install R packages dependencies:

$ Rscript -e "install.packages('optparse')"
$ Rscript -e "install.packages('stringr')"
$ Rscript -e "source('https://bioconductor.org/biocLite.R'); biocLite('Biostrings')"
Rscript -e "library(devtools); install_github('mhahsler/rBLAST')"

To build the cluster_identifier (estimated install time <5 minutes):

$ cd cluster_identifier/src
$ make

That should be it. It will create an executable named `build/cluster_identifier`.

Building requires `HTSlib` and a few other dev packages (installation instructions for Debian/Ubuntu/Centos above).
Please edit `cluster_identifier/src/Makefile` if `HTSlib` is not installed in the default location.

/usr/local/include/htslib/*.h
/usr/local/lib/libhts.a

Running
-------
Expand All @@ -69,15 +57,18 @@ sequences. Second, `SCRAMble-MEIs.R` analyzes the cluster file for likely MEIs.
To run SCRAMble cluster_identifier:

$ /path/to/scramble/cluster_identifier/src/build/cluster_identifier \
/path/to/install_dir/scramble/validation/test.bam > /path/to/output/clusters.txt
/path/to/install_dir/scramble/validation/test.bam > /path/to/output/test.clusters.txt

To run SCRAMble-MEIs (with default settings):
To run SCRAMble-MEIs and SCRAMble-dels(with default settings):

$ Rscript --vanilla /path/to/scramble/cluster_analysis/bin/SCRAMble-MEIs.R \
--out-name /path/to/output/out.txt \
--cluster-file /path/to/output/clusters.txt \
$ Rscript --vanilla /path/to/scramble/cluster_analysis/bin/SCRAMble.R \
--out-name /path/to/output/test \
--cluster-file /path/to/output/test.clusters.txt \
--install-dir /path/to/scramble/cluster_analysis/bin \
--mei-refs /path/to/scramble/cluster_analysis/resources/MEI_consensus_seqs.fa
--mei-refs /path/to/scramble/cluster_analysis/resources/MEI_consensus_seqs.fa \
--ref /path/to/scramble/validation/test.fa \
--eval-meis \
--eval-dels

Running with Docker
-------------------
Expand All @@ -88,12 +79,16 @@ SCRAMble is also distributed with a `Dockerfile`. Running SCRAMble using `docker
$ docker build -t scramble:latest .
$ docker run -it --rm scramble:latest bash
# cluster_identifier \
/app/validation/test.bam > clusters.txt
# Rscript --vanilla /app/cluster_analysis/bin/SCRAMble-MEIs.R \
--out-name ${PWD}/out.txt \
--cluster-file ${PWD}/clusters.txt \
/app/validation/test.bam > /app/validation/test.clusters.txt
# Rscript --vanilla /app/cluster_analysis/bin/SCRAMble.R \
--out-name ${PWD}/test \
--cluster-file /app/validation/test.clusters.txt \
--install-dir /app/cluster_analysis/bin \
--mei-refs /app/cluster_analysis/resources/MEI_consensus_seqs.fa
--mei-refs /app/cluster_analysis/resources/MEI_consensus_seqs.fa \
--ref /app/validation/test.fa \
--eval-dels \
--eval-meis


Output
------
Expand All @@ -108,7 +103,7 @@ The columns are as follows:
| 4. | Clipped read consensus |
| 5. | Anchored read consensus |

The output of SCRAMble-MEIs.R is a tab delimited text file with MEI calls. If no MEIs are present an output file will still be produced with only the header.
Calling `SCRAMble.R` with `--eval-meis` produces a tab delimted file. If a reference `.fa` file is provided, then a VCF is produced as well. The `<out-name>_MEIs.txt` output is a tab delimited text file with MEI calls. If no MEIs are present an output file will still be produced with only the header.
The columns are as follows:

| | | |
Expand All @@ -130,13 +125,34 @@ The columns are as follows:
| 15. | TSD | Target site duplication sequence if polyA clipped read cluster found |
| 16. | TSD_length | Length of target site duplication if polyA clipped read cluster found |

R Dependencies
--------------
SCRAMBLE-MEIs.R was developed on R version 3.1.1 (2014-07-10) and uses the following libraries:

[1] Biostrings_2.34.1 XVector_0.6.0 IRanges_2.0.1
[4] S4Vectors_0.4.0 BiocGenerics_0.12.1 stringr_0.6.2
[7] optparse_1.4.4
Calling `SCRAMble.R` with `--eval-dels` produced a VCF and a tab delimted file. The `<out-name>_PredictedDeletions.txt` output is a tab delimited text file with deletion calls. If no deletions are present an output file will still be produced with only the header.
The columns are as follows:

| | | |
| ---: | ----------------------------- | -------------------------------------------------------------------------------------------- |
| 1. | CONTIG | Chromosome |
| 2. | DEL.START | Deletion start coordinate (0-based) |
| 3. | DEL.END | Deletion end coordinate (0-based) |
| 4. | REF.ANCHOR.BASE | Reference based at deletion start |
| 5. | DEL.LENGTH | Deletion length |
| 6. | RIGHT.CLUSTER | Name of right cluster |
| 7. | RIGHT.CLUSTER.COUNTS | Number of supporting reads in right cluster |
| 8. | LEFT.CLUSTER | Name of left cluster |
| 9. | LEFT.CLUSTER.COUNTS | Number of supporting reads in left cluster |
| 10. | LEN.RIGHT.ALIGNMENT | Length of right-clipped consensus sequence involved in alignment |
| 11. | SCORE.RIGHT.ALIGNMENT | BLAST alignment bitscore for right-clipped consensus |
| 12. | PCT.COV.RIGHT.ALIGNMENT | Percent length of right-clipped consensus involved in alignment |
| 13. | PCT.IDENTITY.RIGHT.ALIGNMENT | Percent identity of right-clipped consensus in alignment |
| 14. | LEN.LEFT.ALIGNMENT | Length of left-clipped consensus sequence involved in alignment |
| 15. | SCORE.LEFT.ALIGNMENT | BLAST alignment bitscore for left-clipped consensus | |
| 16. | PCT.COV.LEFT.ALIGNMENT | Percent length of left-clipped consensus involved in alignment
| 17. | PCT.IDENTITY.LEFT.ALIGNMENT | Percent identity of right-clipped consensus in alignment |
| 18. | INS.SIZE | Length of insert within deleted sequence (for two-end deletions only)
| 19. | INS.SEQ | Inserted sequence (for two-end deletions only)
| 20. | RIGHT.CLIPPED.SEQ | Clipped consensus sequence for right-clipped cluster
| 21. | LEFT.CLIPPED.SEQ | Clipped consensus sequence for left-clipped cluster



Disclaimers
Expand Down
72 changes: 0 additions & 72 deletions cluster_analysis/bin/SCRAMble-MEIs.R

This file was deleted.

Loading

0 comments on commit 0d4bb69

Please sign in to comment.