Deep generative models such as generative adversarial networks, variational autoencoders, and autoregressive models are rapidly growing in popularity for the discovery of new molecules and materials. In this work, we introduce MOlecular SEtS (MOSES), a benchmarking platform to support research on machine learning for drug discovery. MOSES implements several popular molecular generation models and includes a set of metrics that evaluate the diversity and quality of generated molecules. MOSES is meant to standardize the research on molecular generation and facilitate the sharing and comparison of new models. Additionally, we provide a large-scale comparison of existing state of the art models and elaborate on current challenges for generative models that might prove fertile ground for new research. Our platform and source code are freely available here.
For more details, please refer to the paper.
If you are using MOSES in your research paper, please cite us as
@article{polykovskiy2018molecular,
title={{M}olecular {S}ets ({MOSES}): {A} {B}enchmarking {P}latform for {M}olecular {G}eneration {M}odels},
author={Polykovskiy, Daniil and Zhebrak, Alexander and Sanchez-Lengeling, Benjamin and Golovanov, Sergey and Tatanov, Oktai and Belyaev, Stanislav and Kurbanov, Rauf and Artamonov, Aleksey and Aladinskiy, Vladimir and Veselov, Mark and Kadurin, Artur and Nikolenko, Sergey and Aspuru-Guzik, Alan and Zhavoronkov, Alex},
journal={arXiv preprint arXiv:1811.12823},
year={2018}
}
We propose a benchmarking dataset refined from the ZINC database.
The set is based on the ZINC Clean Leads collection. It contains 4,591,276 molecules in total, filtered by molecular weight in the range from 250 to 350 Daltons, a number of rotatable bonds not greater than 7, and XlogP less than or equal to 3.5. We removed molecules containing charged atoms or atoms besides C, N, S, O, F, Cl, Br, H or cycles longer than 8 atoms. The molecules were filtered via medicinal chemistry filters (MCFs) and PAINS filters.
The dataset contains 1,936,962 molecular structures. For experiments, we split the dataset into a training, test and scaffold test sets containing around 1.6M, 176k, and 176k molecules respectively. The scaffold test set contains unique Bemis-Murcko scaffolds that were not present in the training and test sets. We use this set to assess how well the model can generate previously unobserved scaffolds.
- Character-level Recurrent Neural Network (CharRNN)
- Variational Autoencoder (VAE)
- Adversarial Autoencoder (AAE)
- Junction Tree Variational Autoencoder (JTN-VAE)
Besides standard uniqueness and validity metrics, MOSES provides other metrics to access the overall quality of generated molecules. Fragment similarity (Frag) and Scaffold similarity (Scaff) are cosine distances between vectors of fragment or scaffold frequencies correspondingly of the generated and test sets. Nearest neighbor similarity (SNN) is the average similarity of generated molecules to the nearest molecule from the test set. Internal diversity (IntDiv) is an average pairwise similarity of generated molecules. Fréchet ChemNet Distance (FCD) measures the difference in distributions of last layer activations of ChemNet. Novelty is a fraction of unique valid generated molecules not present in the training set.
Model | Valid (↑) | Unique@1k (↑) | Unique@10k (↑) | FCD (↓) | SNN (↑) | Frag (↑) | Scaf (↑) | IntDiv (↑) | IntDiv2 (↑) | Filters (↑) | Novelty (↑) | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Test | TestSF | Test | TestSF | Test | TestSF | Test | TestSF | ||||||||
Train | 1.0 | 1.0 | 1.0 | 0.008 | 0.476 | 0.642 | 0.586 | 1.0 | 0.999 | 0.991 | 0.0 | 0.857 | 0.851 | 1.0 | 1.0 |
AAE | 0.937±0.034 | 1.0±0.0 | 0.997±0.002 | 0.556±0.203 | 1.057±0.237 | 0.608±0.004 | 0.568±0.005 | 0.991±0.005 | 0.99±0.004 | 0.902±0.037 | 0.079±0.009 | 0.856±0.003 | 0.85±0.003 | 0.996±0.001 | 0.793±0.028 |
CharRNN | 0.975±0.026 | 1.0±0.0 | 0.999±0.0 | 0.073±0.025 | 0.52±0.038 | 0.601±0.021 | 0.565±0.014 | 1.0±0.0 | 0.998±0.0 | 0.924±0.006 | 0.11±0.008 | 0.856±0.0 | 0.85±0.0 | 0.994±0.003 | 0.842±0.051 |
JTN-VAE | 1.0 | 1.0 | 0.999 | 0.422 | 0.996 | 0.556 | 0.527 | 0.996 | 0.995 | 0.892 | 0.1 | 0.851 | 0.845 | 0.978 | 0.915 |
VAE | 0.977±0.001 | 1.0±0.0 | 0.998±0.001 | 0.099±0.013 | 0.567±0.034 | 0.626±0.0 | 0.578±0.001 | 0.999±0.0 | 0.998±0.0 | 0.939±0.002 | 0.059±0.01 | 0.856±0.0 | 0.85±0.0 | 0.997±0.0 | 0.695±0.007 |
For comparison of molecular properties, we computed the Frèchet distance between distributions of molecules in the generated and test sets. Below, we provide plots for lipophilicity (logP), Synthetic Accessibility (SA), Quantitative Estimation of Drug-likeness (QED), Natural Product-likeness (NP) and molecular weight.
logP | SA |
---|---|
NP | QED |
weight | |
The simplest way to install MOSES (models and metrics) is to install RDKit: conda install -yq -c rdkit rdkit
and then install MOSES (molsets
) from pip (pip install molsets
).
If you are using Ubuntu, you should also install sudo apt-get install libxrender1 libxext6
for RDKit.
-
Install docker and nvidia-docker.
-
Pull an existing image (4.1Gb to download) from DockerHub:
docker pull molecularsets/moses
or clone the repository and build it manually:
git clone https://github.com/molecularsets/moses.git
nvidia-docker image build --tag molecularsets/moses moses/
- Create a container:
nvidia-docker run -it --name moses --network="host" --shm-size 10G molecularsets/moses
- The dataset and source code are available inside the docker container at /moses:
docker exec -it molecularsets/moses bash
Alternatively, install dependencies and MOSES manually.
- Clone the repository:
git lfs install
git clone https://github.com/molecularsets/moses.git
-
Install RDKit for metrics calculation.
-
Install MOSES:
python setup.py install
-
Install MOSES as described in the previous section.
-
Calculate metrics for the trained model:
python scripts/eval.py --ref_path <reference dataset> --gen_path <generated dataset>
- Add both generated samples and metrics to your repository
You can run pretty much everything with:
python scripts/run.py
This will split the dataset, train the models, generate new molecules, and calculate the metrics. Evaluation results will be saved in metrics.csv
.
You can specify the GPU device index as cuda:n
(or cpu
for CPU) and/or model by running:
python scripts/run.py --device cuda:1 --model aae
For more details run python scripts/run.py --help
.
You can reproduce evaluation of all models with several seeds by running:
sh scripts/run_all_models.sh
python scripts/train.py <model name> \
--train_load <train dataset> \
--model_save <path to model> \
--config_save <path to config> \
--vocab_save <path to vocabulary>
To get a list of supported models run python scripts/train.py --help
.
For more details of certain model run python scripts/train.py <model name> --help
.
python scripts/sample.py <model name> \
--model_load <path to model> \
--vocab_load <path to vocabulary> \
--config_load <path to config> \
--n_samples <number of samples> \
--gen_save <path to generated dataset>
To get a list of supported models run python scripts/sample.py --help
.
For more details of certain model run python scripts/sample.py <model name> --help
.
python scripts/eval.py \
--ref_path <reference dataset> \
--gen_path <generated dataset>
For more details run python scripts/eval.py --help
.