GitHub

Geometric deep learning of protein-DNA binding specificity

Try it out on our web server (CPU only)

Code ocean capsule

(same code structure as github. If you wish to modify code/change input just copy the capsule into a new capsule that you own on Code Ocean!)

Installation

(should take 5-10 minutes with proper system setup)

1. Git clone the repository

git clone https://github.com/timkartar/DeepPBS

2. Install pythonic dependencies

Pythonic dependencies for DeepPBS are listed on deeppbs_linux.yml. We recommend installation via conda packagement tool. If you do not have conda please refer conda installation instructions Here

cd DeepPBS
conda env create -f deeppbs_linux.yml
conda activate deeppbs

Use the deeppbs_linux_cpu_only.yml file for cpu only installation.

3. Install DeepPBS

pip install -e .

4. Third party packages

The preprocessing scripts depend on 3DNA and Curves, we have provided the packages required in dependencies/bin and how to source them in run/process/proc_source.sh. However, please refer to x3dna-v2.3-linux-64bit/x3dna-v2.3/license.txt for fair usage of this version of 3DNA software.

Note: The installation is tested on linux systems with cuda11.3 and cuda11.6, you may have to adjust Pyorch version number based on your system.

UPDATE (Feb 29, 2024): The latest version on github is tested on CUDA 12.2, PyTorch 2.3 and PyG 2.5. The `.yml` file has been updated accordingly.

The project was developed on PyG2.0.1, although future versions of PyG are backwards compatible as of now, but we cannot guarantee stability on all versions. For more information refer installation pages for PyTorch and PyG

Usage pipeline for pre-trained DeepPBS

Example pipeline for processing and predicting is as below:

cd run/process/
Put your PDB files containing biological aseemblies of interest into pdb directory
run ls pdb > input.txt
./process_and_predict.sh (you can parallelize the steps in this script through multiple job submissions)

This will process the list of pdbs and put the processed npz files into npz directory.

Note: As evident, you can parallelize this script, but in that case make sure you create a separate working directory for each job. Otherwise temporary files generated during processing may conflict.

Then it will make predictions using the DeepPBS ensemble and put the predictions in output directory (in run/process) Combined pre-processing and inference time for one biological assembly is in the order of seconds (e.g., for PDB ID 5x6g, about 15-20 seconds)

Compute and Visualize perturbation based heavy atom interpretability

cd run/process
./vis_interpret.sh <pdb_name_without .pdb>, for example ./vis_interpret.sh 5x6g

This will compute and store the perturbation outcomes and other required information in run/plot_scripts/interpret_output

You need a PyMol executable for this step! Once installed, you can run the following

pymol (opens pymol GUI)
pip install matplotlib (in the pymol GUI command prompt)
close the pymol GUI
pymol ../plot_scripts/vis_interpret.py ../plot_scripts/ 5x6g.pdb (run from terminal)

This will open a pymol session for the visualization (screenshot below) and save a .psw file in run/plot_scripts/interpret_output

Simulation trajectories in PDB format snapshots can be processed in similar manner:

Data availability

The full set of PDB chain ID (which have DNA in the biological assembly) and corresponding PWM ids are available in a clusterwise manner in data/jaspar_h11mo_cluster_wise_dna_containing_dataset.npy (note: all of these do not pass processing criteria)
Processed npz files which may go as input to DeepPBS: Here
All analyzed Exd-Scr simulations frames in PDB format: Here
Cross-validation set is listed as run/fold/valid*.txt and benchmark set is listed as run/folds/id.txt
Simulation parameters: Here
List of protein sequence IDs used for application on predicted structures: Here
Custom scripts and intermediate files used for data gathering from JASPAR, HOCOMOCO and PDB: Here
Custom scripts and processed outputs for comparison with mutagenesis data is available Here
Custom scripts and generated data for application on predicted structures are available Here

Run training

Download and place the data avilability number 2 somewhere on your system and configure the path in /run/config.json ("data_dir"). Also configure the "output_path" as you wish.

run ./submit_cross.sh . This will submit 5 cross-validation models to train simultaneaously. Modify this script according to your need.

Acknowledgement

Parts of pre-processing code was contributed by Jared Sagendorf from previous projects in the Rohs Lab: DNAProDB (https://github.com/jaredsagendorf/dnaprodb-back) and GEMENAI (https://github.com/jaredsagendorf/gemenai)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Geometric deep learning of protein-DNA binding specificity

Try it out on our web server (CPU only)

Code ocean capsule

Installation

1. Git clone the repository

2. Install pythonic dependencies

3. Install DeepPBS

4. Third party packages

UPDATE (Feb 29, 2024): The latest version on github is tested on CUDA 12.2, PyTorch 2.3 and PyG 2.5. The `.yml` file has been updated accordingly.

Usage pipeline for pre-trained DeepPBS

Compute and Visualize perturbation based heavy atom interpretability

Data availability

Run training

Acknowledgement

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
data		data
deeppbs		deeppbs
dependencies/bin		dependencies/bin
run		run
x3dna-v2.3-linux-64bit/x3dna-v2.3		x3dna-v2.3-linux-64bit/x3dna-v2.3
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.md		README.md
deeppbs_linux.yml		deeppbs_linux.yml
deeppbs_linux_cpu_only.yml		deeppbs_linux_cpu_only.yml
setup.py		setup.py

License

pottsj21/DeepPBS

Folders and files

Latest commit

History

Repository files navigation

Geometric deep learning of protein-DNA binding specificity

Try it out on our web server (CPU only)

Code ocean capsule

Installation

1. Git clone the repository

2. Install pythonic dependencies

3. Install DeepPBS

4. Third party packages

UPDATE (Feb 29, 2024): The latest version on github is tested on CUDA 12.2, PyTorch 2.3 and PyG 2.5. The .yml file has been updated accordingly.

Usage pipeline for pre-trained DeepPBS

Compute and Visualize perturbation based heavy atom interpretability

Data availability

Run training

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

UPDATE (Feb 29, 2024): The latest version on github is tested on CUDA 12.2, PyTorch 2.3 and PyG 2.5. The `.yml` file has been updated accordingly.

Packages