-
Notifications
You must be signed in to change notification settings - Fork 2
Install VEP
VEP is regularly upgraded, and the versions used below are just an example.
If you used a DB dump - see Install-from-database-dump - make sure you use the VEP version of the dump, otherwise you will have to re-annotate all of the variants.
sudo apt-get install -y tabix mysql-server mysql-client libmysqlclient-dev perlbrew git curl libdb-dev libgd-dev pkg-config
If you don't plan on using PerlBrew - and want to use system Perl (not reccomended) then:
apt-get install cpanminus
Perlbrew installs Perl modules into your home directory rather than the system. I recommend it as it reduces conflicts and is easier to blow away your installation and start again.
See PerlBrew section of VEP install
su variantgrid # or whatever user you run VG as
perlbrew init # Need for next line's install to work otherwise we get 'Failed to download'
perlbrew install -j 5 --as 5.30.2 --thread --64all -Duseshrplib perl-5.30.2 --notest
perlbrew switch 5.30.2
perlbrew install-cpanm
If you use Perlbrew, set settings.ANNOTATION_VEP_PERLBREW_RUNNER_SCRIPT
which will run a script to switch to appropriate perlbrew before calling VEP.
# Run after switching in Perlbrew or running as root to use system Perl
export PERL_MM_USE_DEFAULT=1 # CPAN automatic yes
cpanm Archive::Zip Archive::Extract DBD::mysql DBI Set::IntervalTree PerlIO::gzip Try::Tiny Role::Tiny::With GD Bio::Perl Test::Warnings
DO NOT install any other Perl libraries by hand. Use the versions provided by VEP.
Follow these instructions for BigWig support
# download and build kent library as per instructions above
cpanm Bio::DB::BigFile # Enter kent src here
See VEP Download/Install We use Ensembl at the moment (SA Path will use RefSeq) in research.
Because systems use shared drives, and may be on different versions, we make sure to have the VEP_VERSION explicitly in the path
export VEP_VERSION=112
export VEP_VERSION_BASE_DIR=/data/annotation/VEP/vep_code/${VEP_VERSION}
export PLUGINS_DIR=${VEP_VERSION_BASE_DIR}/plugins
mkdir -p ${PLUGINS_DIR}
cd ${VEP_VERSION_BASE_DIR}
git clone https://github.com/Ensembl/ensembl-vep.git
cd ensembl-vep
git checkout release/${VEP_VERSION}
# Install program/libraries
export VEP_CACHE=/data/annotation/VEP/vep_cache
# Do "a" and "p" as separate operations as if tests fails plugins won't be installed
perl INSTALL.pl --AUTO a --PLUGINS all --CACHEDIR ${VEP_CACHE} --PLUGINSDIR ${PLUGINS_DIR}
perl INSTALL.pl --AUTO p --PLUGINS all --CACHEDIR ${VEP_CACHE} --PLUGINSDIR ${PLUGINS_DIR}
Then download the fasta file
# You may need to replace 'homo_sapiens' with 'homo_sapiens_refseq'
perl INSTALL.pl --AUTO f --ASSEMBLY GRCh37 --SPECIES homo_sapiens --CACHEDIR ${VEP_CACHE}
perl INSTALL.pl --AUTO f --ASSEMBLY GRCh38 --SPECIES homo_sapiens --CACHEDIR ${VEP_CACHE}
# Go to the cache directories then run the following:
# GRCh37
gzip -d Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz;bgzip Homo_sapiens.GRCh37.75.dna.primary_assembly.fa;samtools faidx Homo_sapiens.GRCh37.75.dna.primary_assembly.fa.gz
# GRCh38
gzip -d Homo_sapiens.GRCh38.dna.toplevel.fa.gz;bgzip Homo_sapiens.GRCh38.dna.toplevel.fa;samtools faidx Homo_sapiens.GRCh38.dna.toplevel.fa.gz
# If you have plugins earlier than v110 - replace with bugfixed version
rm MaveDB.pm
wget https://raw.githubusercontent.com/Ensembl/VEP_plugins/main/MaveDB.pm
Copy the Grantham VEP Plugin (thanks to Duarte Molha):
cp ${VARIANTGRID_DIR}/annotation/annotation_data/generate_annotation/Grantham.pm ${PLUGINS_DIR}
If you have access to the servers, copy from:
sacgf.ersa.edu.au:/data/sacgf/reference/VEP
or a server /data/annotation/VEP
There are some scripts in VariantGrid git repo: annotation/annotation_data/vep_install
to download the data for a particular genome build. The minimum you need is:
echo "VEP Cache"
wget ftp://ftp.ensembl.org/pub/release-97/variation/indexed_vep_cache/homo_sapiens_vep_97_GRCh37.tar.gz
# tar xvfz homo_sapiens_vep_97_GRCh37.tar.gz
The data files for gnomAD are very large (500G GRCh37 and 800G for GRCh38) so we download them per-chromosome, pre-process them to shrink them down to around 4G each. See the gnomad_data.py
script.
Plugin data is created as per instructions on VEP Plugin page.
The scripts I ran are stored in plugin_data.sh
The annotation data has to be laid out to match default_settings.ANNOTATION
variable.
The easiest way is to get it going is to preserve the structure on our existing servers and the config, and then overwrite the ANNOTATION_VEP_BASE_DIR
variable, as all VEP annotations are relative to this dir.
python3 manage.py vep_version --genome-build=GRCh37
See also VEP troubleshooting