Install VEP

Installing VEP

NOTE:

VEP is regularly upgraded, and the versions used below are just an example.

If you used a DB dump - see Install-from-database-dump - make sure you use the VEP version of the dump, otherwise you will have to re-annotate all of the variants.

Requirements

sudo apt-get install -y tabix mysql-server mysql-client libmysqlclient-dev perlbrew git curl libdb-dev libgd-dev pkg-config

If you don't plan on using PerlBrew - and want to use system Perl (not reccomended) then:

apt-get install cpanminus

Perlbrew

Perlbrew installs Perl modules into your home directory rather than the system. I recommend it as it reduces conflicts and is easier to blow away your installation and start again.

See PerlBrew section of VEP install

su variantgrid # or whatever user you run VG as
perlbrew init # Need for next line's install to work otherwise we get 'Failed to download'
perlbrew install -j 5 --as 5.30.2 --thread --64all -Duseshrplib perl-5.30.2 --notest
perlbrew switch 5.30.2
perlbrew install-cpanm

If you use Perlbrew, set settings.ANNOTATION_VEP_PERLBREW_RUNNER_SCRIPT which will run a script to switch to appropriate perlbrew before calling VEP.

Perl dependencies

# Run after switching in Perlbrew or running as root to use system Perl

export PERL_MM_USE_DEFAULT=1 # CPAN automatic yes
cpanm Archive::Zip Archive::Extract DBD::mysql DBI Set::IntervalTree PerlIO::gzip Try::Tiny Role::Tiny::With GD Bio::Perl Test::Warnings

DO NOT install any other Perl libraries by hand. Use the versions provided by VEP.

Follow these instructions for BigWig support

# download and build kent library as per instructions above
cpanm Bio::DB::BigFile # Enter kent src here

VEP

See VEP Download/Install We use Ensembl at the moment (SA Path will use RefSeq) in research.

Because systems use shared drives, and may be on different versions, we make sure to have the VEP_VERSION explicitly in the path

export VEP_VERSION=112
export VEP_VERSION_BASE_DIR=/data/annotation/VEP/vep_code/${VEP_VERSION}
export PLUGINS_DIR=${VEP_VERSION_BASE_DIR}/plugins
mkdir -p ${PLUGINS_DIR}
cd ${VEP_VERSION_BASE_DIR}
git clone https://github.com/Ensembl/ensembl-vep.git
cd ensembl-vep
git checkout release/${VEP_VERSION}
# Install program/libraries
export VEP_CACHE=/data/annotation/VEP/vep_cache
# Do "a" and "p" as separate operations as if tests fails plugins won't be installed
perl INSTALL.pl --AUTO a --PLUGINS all --CACHEDIR ${VEP_CACHE} --PLUGINSDIR ${PLUGINS_DIR}
perl INSTALL.pl --AUTO p --PLUGINS all --CACHEDIR ${VEP_CACHE} --PLUGINSDIR ${PLUGINS_DIR}

Then download the fasta file

# You may need to replace 'homo_sapiens' with 'homo_sapiens_refseq'
perl INSTALL.pl --AUTO f --ASSEMBLY GRCh37 --SPECIES homo_sapiens --CACHEDIR ${VEP_CACHE}
perl INSTALL.pl --AUTO f --ASSEMBLY GRCh38 --SPECIES homo_sapiens --CACHEDIR ${VEP_CACHE}

# Go to the cache directories then run the following:

# GRCh37
gzip -d Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz;bgzip Homo_sapiens.GRCh37.75.dna.primary_assembly.fa;samtools faidx Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz 
# GRCh38
gzip -d Homo_sapiens.GRCh38.dna.toplevel.fa.gz;bgzip Homo_sapiens.GRCh38.dna.toplevel.fa;samtools faidx Homo_sapiens.GRCh38.dna.toplevel.fa.gz

VEP Plugins

# If you have plugins earlier than v110 - replace with bugfixed version
rm MaveDB.pm
wget https://raw.githubusercontent.com/Ensembl/VEP_plugins/main/MaveDB.pm

Copy the Grantham VEP Plugin (thanks to Duarte Molha):

cp ${VARIANTGRID_DIR}/annotation/annotation_data/generate_annotation/Grantham.pm ${PLUGINS_DIR}

Downloading Annotation Data

If you have access to the servers, copy from:

sacgf.ersa.edu.au:/data/sacgf/reference/VEP or a server /data/annotation/VEP

There are some scripts in VariantGrid git repo: annotation/annotation_data/vep_install to download the data for a particular genome build. The minimum you need is:

echo "VEP Cache"
wget ftp://ftp.ensembl.org/pub/release-97/variation/indexed_vep_cache/homo_sapiens_vep_97_GRCh37.tar.gz
# tar xvfz homo_sapiens_vep_97_GRCh37.tar.gz

The data files for gnomAD are very large (500G GRCh37 and 800G for GRCh38) so we download them per-chromosome, pre-process them to shrink them down to around 4G each. See the gnomad_data.py script.

Plugin data is created as per instructions on VEP Plugin page.

The scripts I ran are stored in plugin_data.sh

Setup Annotation Data

The annotation data has to be laid out to match default_settings.ANNOTATION variable.

The easiest way is to get it going is to preserve the structure on our existing servers and the config, and then overwrite the ANNOTATION_VEP_BASE_DIR variable, as all VEP annotations are relative to this dir.

Testing

python3 manage.py vep_version --genome-build=GRCh37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly