This repository contains the PyTorch implementation of the paper "Empowering Active Learning for 3D Molecular Graphs with Geometric Graph Isomorphism", R.Subedi, L.Wei, W.Gao, S.Chakraborty., Y.Liu. NeurIPS, 2024.
In this paper, we present a principled Active Learning (AL) paradigm for 3D molecular learning. We propose a set of new 3D graph isometries for obtaining geometric representations of 3D molecular graphs. We formulate a criterion based on uncertainty (using the Bayesian Geometric Graph Neural Network (BGNN)) and diversity (using our proposed method for representing 3D geometric graphs), and pose active sampling as a Quadratic Programming (QP) problem. Experiments on the 3D molecular datasets(QM9 and MD17) demonstrate the effectiveness of our method compared to the baselines.
Some part of the codebase is adapted from DIG repository.
@inproceedings{
subedi2024empowering,
title={Empowering Active Learning for 3D Molecular Graphs with Geometric Graph Isomorphism},
author={Subedi, Ronast and Wei, Lu and Gao, Wenhan and Chakraborty, Shayok and Liu, Yi},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=He2GCHeRML}
}
├── al - contains code related to Active Sampling methods
│
├── dataset - contains data for training
| (data needs to be saved in .pt format)
│
├── datasets - contains code for preparing dataset
│
├── models - code of baseline models and our BGNN
│
├── utils - contains general utility functions
|
├── runs - for saving the results,
| the name of folder should be same as selection_method in run.sh file
|
│ └── random - for random method
│ └── run1
| └── init_set.npy - file containing initial labeled indices(user should prepare)
|
│ └── unc_div - for our method
| └── run1
└── tensor.pt - similarity matrix (compute using compute_sim_mat.py)
| └── init_set.npy - file containing initial labeled indices
|
├── run.sh - for configuring training hyper parameters
|
|
├── compute_sim_mat.py - for computing similarity matrix(our and soap method)
| * save the obtained tensor.pt file in appropriate folder inside runs/ *
| * use the function: compute_uncertainty_diversity_gpu for faster computation*
|
|
├── train.py
- Prepare data in .pt format in [train, valid, test] order.
- Use Linux Environment to run experiments
- Use conda for package management
- Update parameters in run.sh file as necessary
- Best results will be saved in particular method inside runs folder
chmod +x run.sh
./run.sh
- For GPU implementation of QP solver use compute_uncertainty_diversity_gpu function in selection_methods.py file
- For GPU implementation: Check osqp/cuosqp repository for further reference
- Check dholzmueller/bmdal_reg repository for using BatchBald selection method