GitHub - akensert/molcraft: Generative deep learning for molecules using transformers.

Transformers with TensorFlow and Keras. Focused on Molecule Generation and Chemistry Predictions.

Note

In progress.

Highlights

Aims to implement efficient models, samplers and [soon] reinforcement learning for SMILES generation and optimization.

Models / Layers
- Implements key-value caching for efficient autoregression
Samplers
- Samples Models for next tokens
- Can generate a batch of sequences in parallel non-eagerly
- Can generate a batch of sequences based on initial sequences of varying lengths
Tokenizers
- Tokenizes data input for Models
- Can be adapted to data via tokenizer.adapt(ds) to build vocabulary
- Can be added as a layer to keras.Sequential
- Can both tokenize and detokenize data

Code Examples

import tensorflow as tf
import keras
import random

from molcraft import tokenizers
from molcraft import models
from molcraft import samplers 

filename = './data/zinc250K.txt' # replace this with actual path

with open(filename, 'r') as fh:
    smiles = fh.read().splitlines()

random.shuffle(smiles)

# Adapt tokenizer (create vocabulary)
tokenizer = tokenizers.SMILESTokenizer(add_bos=True, add_eos=True)
tokenizer.adapt(smiles)

# Build dataset (input pipeline)
ds = tf.data.Dataset.from_tensor_slices(smiles)
ds = ds.shuffle(8192)
ds = ds.batch(256)
ds = ds.map(tokenizer)
ds = ds.map(lambda x: (x[:, :-1], x[:, 1:]))
ds = ds.prefetch(-1)

# Build, compile, and fit model
model = models.TransformerDecoder(
    num_layers=4,
    num_heads=8,
    embedding_dim=512,
    intermediate_dim=1024,
    vocabulary_size=tokenizer.vocabulary_size,
    sequence_length=tokenizer.sequence_length,
    dropout=0,
)
model.compile(
    optimizer=keras.optimizers.Adam(learning_rate=3e-4), 
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True)
)
model.fit(ds, epochs=1)

# Generate 32 novel SMILES with sampler
sampler = samplers.TopKSampler(model, tokenizer)
smiles = sampler.sample([''] * 32)

Installation

Note

Project is under development, hence incomplete and subject to breaking changes.

For GPU users:

git clone [email protected]:akensert/molcraft.git
pip install -e .[gpu]

For CPU users:

git clone [email protected]:akensert/molcraft.git
pip install -e .

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
media		media
molcraft		molcraft
notebooks		notebooks
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Highlights

Code Examples

Installation

About

Releases

Packages

Languages

License

akensert/molcraft

Folders and files

Latest commit

History

Repository files navigation

Highlights

Code Examples

Installation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages