Skip to content

Latest commit

 

History

History
106 lines (76 loc) · 4.14 KB

README.md

File metadata and controls

106 lines (76 loc) · 4.14 KB

classifierpromax

drawing

Documentation Status Python 3.12 ci-cd codecov

classifierpromax is a scikit-learn wrapper library that helps to train and optimize multiple classifier models in parallel.

ClassifierTrainer(): Train multiple machine learning classifiers using cross-validation and return trained models and evaluation metrics.

FeatureSelector(): Selects features for multiple classification models using RFE or Pearson methods.

ClassifierOptimizer(): Optimizes a dictionary of scikit-learn Pipeline classifiers using RandomizedSearchCV and evaluates their performance.

ResultsHandler(): Processes and combines scoring results from model training and optimization.

In a machine learning pipeline, code can often be repeated when working with multiple models, violating the DRY (Don’t-Repeat-Yourself) principle. This Python library is to promote DRY principles in machine learning code and create cleaner code.

Installation

Before installation, please make sure Python 3.12 or newer is installed.

$ pip install classifierpromax

Usage

  1. Training baseline models
import pandas as pd
import numpy as np
from classifierpromax.ClassifierTrainer import ClassifierTrainer
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

# Dummy data
X = pd.DataFrame(np.random.rand(100, 5), columns=[f"feature_{i}" for i in range(5)])
y = pd.Series(np.random.randint(0, 2, size=100))

preprocessor = StandardScaler()
baseline_models, baseline_score = ClassifierTrainer(preprocessor, X, y, seed=123)
  1. Feature selection
from classifierpromax.FeatureSelector import FeatureSelector

fs_models = FeatureSelector(preprocessor, baseline_models, X, y, n_features_to_select=3)
  1. Hyperparameter optimization
from classifierpromax.ClassifierOptimizer import ClassifierOptimizer

opt_models, opt_score = ClassifierOptimizer(fs_models, X, y, scoring="f1")
  1. Results summary
from classifierpromax.ResultHandler import ResultHandler

summary = ResultHandler(baseline_score, opt_score)
print(summary)

For a more in-depth tutorial on using the library, please refer to the example.ipynb Jupyter Notebook in the docs folder.

Testing

Create a new environment with Python 3.12.

conda create -n classifierpromax python=3.12
conda activate classifierpromax

Clone the repo and cd in to the directory.

git clone [email protected]:UBC-MDS/ClassifierProMax.git
cd ClassifierProMax

Install poetry following these instructions and then run the following bash command to install the depencies needed to run the library.

$ poetry install

Execute pytest from the root project directory.

$ pytest

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

Contributors

Long Nguyen, Jenson Chang, Gunisha Kaur, Han Wang

License

classifierpromax was created by Long Nguyen, Jenson Chang, Gunisha Kaur, Han Wang. It is licensed under the terms of the MIT license.

Credits

classifierpromax was created with cookiecutter and the py-pkgs-cookiecutter template.