A reproduction of the TopologyNet-BP algorithm from TopologyNet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictions by Zixuan Cang, Guo-Wei Wei. Paper link
This repository is part of Predicting Affinity Through Homology (PATH): Interpretable Binding Affinity Prediction with Persistent Homology. The inference code for PATH can be found in the OSPREY3 package. The training code for PATH can be found here.
Execution of all the code in this repository is done on an x86 machine running Ubuntu 20
- Install git lfs.
- Clone this repository.
- Prerequisites: My software environment is managed with conda.
- Install conda or mamba (you only need one of the two).
- Run
conda env create -f tnet2017.yml
ormamba env create -f tnet2017.yml
to create thetnet2017
environment. Activate this environment byconda activate tnet2017
. - Finally,
pip install -r requirements.txt
to install additional dependencies for this project that are not available through conda.
- Construct persistent homology features: Start with
ph/README.md
to create a feature vector for each protein-ligand complex using persistent homology, by the method described by Cang and Wei. - Neural network: Start with
ml/README.md
to use the TopologyNet neural network architecture.
I execute these additional scripts in a compute cluster managed by SLURM. To keep track of the statuses of jobs and their results, I use a Redis database and a custom MapReduce-style system.
I first explain my custom MapReduce-style system. This system consists of two scripts, job_wrapper.py
and dispatch_jobs.py
, a SLURM scheduler, and a Redis database. If you are running these scripts in a SLURM cluster, you will need to modify the headers of the temporary shell scripts (see below) to fit the configuration of your cluster. If you are executing these scripts on a compute cluster with a different job scheduler, more changes will need to be made according how compute jobs are submitted on your cluster.
- Each task is associated with a set of sequentially numbered key starting from a prefix, which is reflected in the
KEY_PREFIX
variable indispatch_jobs.py
. dispatch_jobs.py
will create an entry in the database for each key containing information about the job and the fields {started
,finished
,error
} all set toFalse
. It then submits by creating temporary shell scripts that executepython job_wrapper.py --key {k}
and submit these shell scripts to the SLURM scheduler.job_wrapper.py
contains the instructions for execution when the work is allocated to a scheduler.
As mentioned, a Redis database is used for managing jobs submitted to the SLURM batch cluster. To set up this database,
- Build and install Redis via [https://redis.io/docs/getting-started/installation/install-redis-from-source/].
- Optionally, add the
src
folder of Redis to path. - Create a
redis.conf
file somewhere and set a default password by putting e.g.requirepass topology
in that file. - Start the redis server on a host with your
redis.conf
and adjust theDB
constant indispatch_jobs.py
accordingly.
The two additional sets of scripts are:
-
Perturbation analysis: Analysis of how much each atom contributes to the prediction of TNet-BP by perturbing them and observing the change in binding affinity predicted by TNet-BP. See
perturbations/README.md
. -
Support vector machine (SVM): An SVM regressor directly on the feature vector constructed with persistent homology, with feature selection similar to that of PATH (Predicting Affinity Through Homology). See
svm
folder.
Perturbation analysis came out fruitless. SVM in the style of PATH didn't perform as well as PATH.
See CITING_OSPREY.txt.
A note on the commit history: Some commits start with "R:". These commits contain results of intermediate exploratory results.