-
Notifications
You must be signed in to change notification settings - Fork 16
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
13 changed files
with
2,616 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,159 @@ | ||
# Custom | ||
/data | ||
/env | ||
/wandb | ||
.DS_Store | ||
# *.ipynb | ||
|
||
# Byte-compiled / optimized / DLL files | ||
__pycache__/ | ||
*.py[cod] | ||
*$py.class | ||
|
||
# C extensions | ||
*.so | ||
|
||
# Distribution / packaging | ||
.Python | ||
build/ | ||
develop-eggs/ | ||
dist/ | ||
downloads/ | ||
eggs/ | ||
.eggs/ | ||
lib/ | ||
lib64/ | ||
parts/ | ||
sdist/ | ||
var/ | ||
wheels/ | ||
share/python-wheels/ | ||
*.egg-info/ | ||
.installed.cfg | ||
*.egg | ||
MANIFEST | ||
|
||
# PyInstaller | ||
# Usually these files are written by a python script from a template | ||
# before PyInstaller builds the exe, so as to inject date/other infos into it. | ||
*.manifest | ||
*.spec | ||
|
||
# Installer logs | ||
pip-log.txt | ||
pip-delete-this-directory.txt | ||
|
||
# Unit test / coverage reports | ||
htmlcov/ | ||
.tox/ | ||
.nox/ | ||
.coverage | ||
.coverage.* | ||
.cache | ||
nosetests.xml | ||
coverage.xml | ||
*.cover | ||
*.py,cover | ||
.hypothesis/ | ||
.pytest_cache/ | ||
cover/ | ||
|
||
# Translations | ||
*.mo | ||
*.pot | ||
|
||
# Django stuff: | ||
*.log | ||
local_settings.py | ||
db.sqlite3 | ||
db.sqlite3-journal | ||
|
||
# Flask stuff: | ||
instance/ | ||
.webassets-cache | ||
|
||
# Scrapy stuff: | ||
.scrapy | ||
|
||
# Sphinx documentation | ||
docs/_build/ | ||
|
||
# PyBuilder | ||
.pybuilder/ | ||
target/ | ||
|
||
# Jupyter Notebook | ||
.ipynb_checkpoints | ||
|
||
# IPython | ||
profile_default/ | ||
ipython_config.py | ||
|
||
# pyenv | ||
# For a library or package, you might want to ignore these files since the code is | ||
# intended to run in multiple environments; otherwise, check them in: | ||
# .python-version | ||
|
||
# pipenv | ||
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. | ||
# However, in case of collaboration, if having platform-specific dependencies or dependencies | ||
# having no cross-platform support, pipenv may install dependencies that don't work, or not | ||
# install all needed dependencies. | ||
#Pipfile.lock | ||
|
||
# poetry | ||
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. | ||
# This is especially recommended for binary packages to ensure reproducibility, and is more | ||
# commonly ignored for libraries. | ||
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control | ||
#poetry.lock | ||
|
||
# PEP 582; used by e.g. github.com/David-OConnor/pyflow | ||
__pypackages__/ | ||
|
||
# Celery stuff | ||
celerybeat-schedule | ||
celerybeat.pid | ||
|
||
# SageMath parsed files | ||
*.sage.py | ||
|
||
# Environments | ||
.env | ||
.venv | ||
env/ | ||
venv/ | ||
ENV/ | ||
env.bak/ | ||
venv.bak/ | ||
|
||
# Spyder project settings | ||
.spyderproject | ||
.spyproject | ||
|
||
# Rope project settings | ||
.ropeproject | ||
|
||
# mkdocs documentation | ||
/site | ||
|
||
# mypy | ||
.mypy_cache/ | ||
.dmypy.json | ||
dmypy.json | ||
|
||
# Pyre type checker | ||
.pyre/ | ||
|
||
# pytype static type analyzer | ||
.pytype/ | ||
|
||
# Cython debug symbols | ||
cython_debug/ | ||
|
||
# PyCharm | ||
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can | ||
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore | ||
# and can be added to the global gitignore or merged into this file. For a more nuclear | ||
# option (not recommended) you can uncomment the following to ignore the entire idea folder. | ||
.idea/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,103 @@ | ||
# geometric-rna-design | ||
gRNAde: Geometric RNA Design Pipeline | ||
# 💣 gRNAde: Geometric RNA Design | ||
|
||
**gRNAde** is a geometric deep learning pipeline for 3D RNA inverse design conditioned on *multiple* backbone conformations. | ||
gRNAde explicitly accounts for RNA conformational flexibility via a novel **multi-Graph Neural Network** architecture which independently encodes a set of conformers via message passing. | ||
|
||
![](fig/grnade_pipeline.png) | ||
|
||
Check out the accompanying paper ['Multi-State RNA Design with Geometric Multi-Graph Neural Networks'](https://arxiv.org/abs/TODO), which introduces gRNAde. | ||
> Chaitanya K. Joshi, Arian R. Jamasb, Ramon Viñas, Charles Harris, Simon Mathis, and Pietro Liò. Multi-State RNA Design with Geometric Multi-Graph Neural Networks. *arXiv preprint, 2023.* | ||
> | ||
>[PDF](https://arxiv.org/pdf/TODO) | | ||
❗️**Note:** gRNAde is under active development. | ||
|
||
|
||
## Directory Structure and Usage | ||
|
||
``` | ||
. | ||
├── README.md | ||
| | ||
├── data # Data files directory | ||
├── notebooks # Jupyter notebooks directory | ||
├── configs # Configuration files directory | ||
| | ||
├── main.py # Main script for launching experiments | ||
| | ||
└── src | ||
├── models.py # Multi-GNN encoder layers and model | ||
├── train # Helper functions for training and evaluation | ||
├── data.py # RNA inverse design dataset | ||
├── data_utils.py # Helper functions for data preparation | ||
└── featurisation.py # Input featurisation helpers | ||
``` | ||
|
||
|
||
|
||
## Installation | ||
|
||
Our experiments used Python 3.8.16 and CUDA 11.3 on NVIDIA Quadro RTX 8000 GPUs. | ||
|
||
```sh | ||
# Create new conda environment | ||
conda create --prefix ./env python=3.8 | ||
conda activate ./env | ||
|
||
# Install PyTorch (Check CUDA version for GPU!) | ||
# Option 1: CPU | ||
# conda install pytorch==1.12.0 -c pytorch | ||
# | ||
# Option 2: GPU, CUDA 11.3 | ||
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch | ||
|
||
# Install dependencies | ||
conda install matplotlib pandas networkx | ||
pip install biopython wandb pyyaml ipdb | ||
conda install jupyterlab -c conda-forge | ||
conda install -c bioconda cd-hit | ||
|
||
# Install PyG (Check CPU/GPU/MacOS) | ||
# Option 1: CPU, MacOS | ||
# pip install torch-scatter torch-sparse torch-cluster torch-spline-conv -f https://data.pyg.org/whl/torch-1.12.0+cpu.html | ||
# pip install torch-geometric | ||
# | ||
# Option 2: GPU, CUDA 11.3 | ||
pip install torch-scatter torch-sparse torch-cluster torch-spline-conv -f https://data.pyg.org/whl/torch-1.12.1+cu113.html | ||
pip install torch-geometric | ||
# | ||
# Option 3: | ||
# conda install pyg -c pyg # CPU/GPU, but may not work on MacOS | ||
``` | ||
|
||
|
||
|
||
## Downloading Data | ||
|
||
We created a machine learning-ready dataset for RNA inverse design using [RNASolo](https://rnasolo.cs.put.poznan.pl) structures at resolution ≤3A. | ||
Download and extract the raw `.pdb` files via the following script into the `data/raw/` directory. | ||
Running `main.py` for the first time will process the raw data and save the processed samples as a `.pt` file. | ||
|
||
```sh | ||
mkdir data/raw | ||
cd data/raw | ||
curl -O https://rnasolo.cs.put.poznan.pl/media/files/zipped/bunches/pdb/all_member_pdb_3_0__3_280.zip | ||
unzip all_member_pdb_3_0__3_280.zip | ||
rm all_member_pdb_3_0__3_280.zip | ||
``` | ||
|
||
Manual download link: https://rnasolo.cs.put.poznan.pl/archive. | ||
Select the following for creating the download: 3D (PDB) + all molecules + all members + res. ≤3.0 | ||
|
||
|
||
|
||
## Citation | ||
|
||
``` | ||
@article{joshi2023multi, | ||
title={Multi-State RNA Design with Geometric Multi-Graph Neural Networks}, | ||
author={Joshi, Chaitanya K. and Jamasb, Arian R. and Viñas, Ramon and Harris, Charles and Mathis, Simon and Liò, Pietro}, | ||
journal={arXiv preprint}, | ||
year={2023}, | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
# Misc configurations | ||
gpu: | ||
value: 0 | ||
desc: GPU ID | ||
seed: | ||
value: 42 | ||
desc: Random seed for reproducibility | ||
save: | ||
value: True | ||
desc: Whether to save current and best model checkpoint | ||
|
||
# Data configurations | ||
data_path: | ||
value: "./data/" | ||
desc: Data directory (preprocessed and raw) | ||
process_raw: | ||
value: True | ||
desc: Whether to process datasets from raw .pdb files | ||
save_processed: | ||
value: True | ||
desc: Whether to save processed datasets | ||
radius: | ||
value: 4.5 | ||
desc: Radius for determining local neighborhoods in Angstrom (currently not used) | ||
top_k: | ||
value: 10 | ||
desc: Number of k-nearest neighbors | ||
num_rbf: | ||
value: 16 | ||
desc: Number of radial basis functions to featurise distances | ||
num_posenc: | ||
value: 16 | ||
desc: Number of positional encodings to featurise edges | ||
num_conformers: | ||
value: 3 | ||
desc: Number of conformations sampled per sequence | ||
|
||
# Splitting configurations | ||
eval_size: | ||
value: 200 | ||
desc: Number of samples in val/test sets | ||
split: | ||
value: 'rmsd' | ||
desc: Type of data split (random/rmsd/struct/seq_identity) | ||
|
||
# Model configurations | ||
model: | ||
value: 'MultiGVPGNN' | ||
desc: Model architecture | ||
node_in_dim: | ||
value: [1, 4] | ||
desc: Input dimensions for node features (scalar channels, vector channels) | ||
node_h_dim: | ||
value: [128, 16] | ||
desc: Hidden dimensions for node features (scalar channels, vector channels) | ||
edge_in_dim: | ||
value: [32, 1] | ||
desc: Input dimensions for edge features (scalar channels, vector channels) | ||
edge_h_dim: | ||
value: [32, 1] | ||
desc: Hidden dimensions for edge features (scalar channels, vector channels) | ||
num_layers: | ||
value: 3 | ||
desc: Number of layers for encoder/decoder | ||
drop_rate: | ||
value: 0.1 | ||
desc: Dropout rate | ||
out_dim: | ||
value: 4 | ||
desc: Output dimension (4 bases for RNA) | ||
|
||
# Training configurations | ||
epochs: | ||
value: 100 | ||
desc: Number of training epochs | ||
lr: | ||
value: 0.001 | ||
desc: Learning rate | ||
batch_size: | ||
value: 8 | ||
desc: Batch size for dataloaders (currently not used) | ||
max_nodes: | ||
value: 5000 | ||
desc: Maximum number of nodes in batch | ||
num_workers: | ||
value: 8 | ||
desc: Number of workers for dataloaders | ||
val_every: | ||
value: 5 | ||
desc: Interval of training epochs after which validation is performed | ||
|
||
# Evaluation configurations | ||
model_path: | ||
value: '' | ||
desc: Path to model checkpoint (for testing) | ||
test_perplexity: | ||
value: False | ||
desc: Whether to test perplexity | ||
test_recovery: | ||
value: False | ||
desc: Whether to test recovery | ||
n_samples: | ||
value: 100 | ||
desc: Number of samples for testing recovery |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.