Releases: CDDLeiden/QSPRpred
Releases · CDDLeiden/QSPRpred
v1.3.1
Change Log
From v1.3.0 to v1.3.1
Fixes
- Fix not re-initiating model weights during DNN training
- Feature values converted to
np.float32
and then np.inf are converted tonan
onDescriptorsCalculator.__call__
.
Changes
QSPRDataset.prepareDataset
changed attributes fromstandardize
andsanitize
to onlystandardizer
.- Accepted parameters are either
chembl
,old
, or a function that reads and standardizes smiles. - None is now also supported to allow skipping smiles standardization.
- SMILES standardization now runs in parallel, but if the input function is not pickable, will just run on a single core.
- Accepted parameters are either
QSPRModel.predictMols
now accepts parameterssmiles_standardizer
,n_jobs
andfill_value
.
v1.3.0
Change Log
From v1.2.0 to v1.3.0
Fixes
- problems with PaDEL descriptors and fingerprints on Linux were fixed
Changes
QSPRModel
metadata now contains two extra entries:model_class
- the fully qualified class name of the modelversion
- the version of QSPRPred used to save the model
- this change is not compatible with older files, but you can manually add these two entries and it should work fine in the newer version
New Features
- The
QSPRModel.fromFile()
method can now instantiate a model from a file directly without knowing the underlying model type. It simply uses the class path stored in the model metadata file now.
v1.2.0
Change Log
From v1.1.0 to v1.2.0
Fixes
- Fix issue with Mordred descriptor
- Descriptor sets now process a list of molecules instead of just one at a time (prevents performance issues if multiple sets are calculated in parallel)
- Empty values of descriptors are now not imputed with 0 automatically, but are left as
NaN
orNone
instead
Changes
- Some features not specific to machine learning were extracted from
QSPRDataset
to a new class calledMoleculeTable
MoleculeTable
is mainly to hold data about molecules, including their descriptors, scaffolds, bioactivities and other data- this class also now manages settings for parallelization and chunking in the constructor rather than on per method basis
- this class will be used as the base class for other data set classes that need molecule data, but have to perform their own transformations to do their job
QSPRDataset
derives fromMoleculeTable
an object describing the training and test set for modelling and also handles data preparation
QSPRDataset
now handles saving of its metadata and other related files (i.e. standardizers and other data transformers) with one method (save
) -> names of the files start with a chosen prefix, which is a name given to the data set- The
SKLearnStandardizer
was added for scaler fitting, applying, saving and loading- The standardization of features is now possible with the
feature_standardizer
argument ofQSPRDataset.prepareDataset
by supplying an instance ofSKLearnStandardizer
or directly aStandardScaler
or any other standardizer fromsklearn.preprocessing
withBaseEstimator
interface- standardization is now also done separately for training and test sets in cross-validation as well
- The standardization of features is now possible with the
- The
DescriptorSet
interface was updated and all built-in descriptors were adapted to reflect this change.- The presence of
descriptors
property getter and setter is now enforced. - When called the
DescriptorSet
implementations now strictly return lists. - Conversion to descriptor data frame is now handled exclusively in
DescriptorsCalculator
- The presence of
- The
datasplit
interface was changed to mimic thesklearn.model_selection.BaseCrossValidator
interface so allsklearn
cross-validation methods can be used with QSPRPred out of the box to either generate train/test split or cross-validation splits (see the new features below) - Default
chunk_size
forMoleculeTable
was set to 50 so that smaller data sets can take advantage of more CPUs as well. - The number of CPUs to use for parallel operations by
MoleculeTable
is now set in the__init__
of the class and is 1 by default so that the default behaviour is to not use parallelism. DescriptorSets
are now initialized with the specific arguments instead of args and kwargs.MorganFP
was replaced by a more general classFingerprintSet
which uses an object from theFingerprint
class as its fingerprint type- The
Predictor
class was replaced, its features are now accessible with the models directly:-
from qsprpred import QSPRsklearn # QSPRDNN can be used the same way from qsprpred import QSPRDataset # creation and loading model = QSPRsklearn( # or QSPRDNN name="any_name", base_dir="/some/path" ) # loading directly from meta file also possible model = QSPRsklearn.fromFile("/path/to/any_name_meta.json") # predictions can be done directly on a list of SMILES model.predictMols([ 'CC(=C)C1CC2=C(O1)C=CC3=C2OC4COC5=CC(=C(C=C5C4C3=O)OC)OC', 'CCOC(=O)C1=C2CN(C(=O)C3=C(N2C=N1)C=CC(=C3)F)C' ]) # classifiers can also use predict_probas=True to get probablities model.predictMols([ 'CC(=C)C1CC2=C(O1)C=CC3=C2OC4COC5=CC(=C(C=C5C4C3=O)OC)OC', 'CCOC(=O)C1=C2CN(C(=O)C3=C(N2C=N1)C=CC(=C3)F)C' ], use_probas=True) # it is also possible to give a QSPRDataset directly: dataset = QSPRDataset(name="data") model.predict(dataset)
- Calls to
predict
,predictProba
orpredictMols
withuse_probas=True
will now return a score ofNone
for invalid molecules.
-
New Features
- Tutorials for training and using the QSPR models
- Depiction of results for classification models (see
qsprpred.plotting.classification
) - The
precomputed
flag was added toQSPRDataset
- Added an option to directly fetch
QSPRDataset
from Papyrus with accession IDs (seeqsprpred.data.sources.papyrus
) - The
datasplit
interface is now used to both generate train/test split and also the cross-validation splits - Train/test split of the data set is now saved in the matrix itself and is reloaded upon deserialization
MoleculeTable
was updated with new features to generate scaffolds of moleculesTanimotoDistances
was added as descriptortype.- Balanced class weighing was added as an option to the CLI
PredictorDesc
was added as a newDescriptorSet
type. It uses a QSPRpred model as descriptor.- New submodule for evaluation metric custom (
qsprpred.metrics
) withcalibration_error
function to estimate the calibration of a classifier - Added the Mold2 and PaDEL molecular descriptors