classifierpromax
is a scikit-learn wrapper library that helps to train and optimize multiple classifier models in parallel.
ClassifierTrainer()
:
Train multiple machine learning classifiers using cross-validation and return trained models and evaluation metrics.
FeatureSelector()
:
Selects features for multiple classification models using RFE or Pearson methods.
ClassifierOptimizer()
:
Optimizes a dictionary of scikit-learn Pipeline classifiers using RandomizedSearchCV and evaluates their performance.
ResultsHandler()
:
Processes and combines scoring results from model training and optimization.
In a machine learning pipeline, code can often be repeated when working with multiple models, violating the DRY (Don’t-Repeat-Yourself) principle. This Python library is to promote DRY principles in machine learning code and create cleaner code.
Before installation, please make sure Python 3.12 or newer is installed.
$ pip install classifierpromax
- Training baseline models
import pandas as pd
import numpy as np
from classifierpromax.ClassifierTrainer import ClassifierTrainer
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
# Dummy data
X = pd.DataFrame(np.random.rand(100, 5), columns=[f"feature_{i}" for i in range(5)])
y = pd.Series(np.random.randint(0, 2, size=100))
preprocessor = StandardScaler()
baseline_models, baseline_score = ClassifierTrainer(preprocessor, X, y, seed=123)
- Feature selection
from classifierpromax.FeatureSelector import FeatureSelector
fs_models = FeatureSelector(preprocessor, baseline_models, X, y, n_features_to_select=3)
- Hyperparameter optimization
from classifierpromax.ClassifierOptimizer import ClassifierOptimizer
opt_models, opt_score = ClassifierOptimizer(fs_models, X, y, scoring="f1")
- Results summary
from classifierpromax.ResultHandler import ResultHandler
summary = ResultHandler(baseline_score, opt_score)
print(summary)
For a more in-depth tutorial on using the library, please refer to the example.ipynb
Jupyter Notebook in the docs
folder.
Create a new environment with Python 3.12.
conda create -n classifierpromax python=3.12
conda activate classifierpromax
Clone the repo and cd
in to the directory.
git clone [email protected]:UBC-MDS/ClassifierProMax.git
cd ClassifierProMax
Install poetry following these instructions and then run the following bash command to install the depencies needed to run the library.
$ poetry install
Execute pytest from the root project directory.
$ pytest
Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.
Long Nguyen, Jenson Chang, Gunisha Kaur, Han Wang
classifierpromax
was created by Long Nguyen, Jenson Chang, Gunisha Kaur, Han Wang. It is licensed under the terms of the MIT license.
classifierpromax
was created with cookiecutter
and the py-pkgs-cookiecutter
template.