Release 0.6.0 - Regression! #113

nnansters · 2022-09-07T10:01:35Z

nnansters
Sep 7, 2022
Maintainer

Hey all,

Niels from NannyML engineering here to deliver you our 0.6.0 release 📦 🚴‍♂️

Installing / upgrading

You can get this latest version by using pip:

pip install -U nannyml

Or if you're using Conda:

conda install -c conda-forge nannyml

What's new?

Oh boy, I am very excited about this release, we have some big news. So without any digression, we're introducing support for regression! 🎉

Yes, you can now use regression models with all of our existing functionality: detecting data drift on model inputs, outputs and targets, calculating realized performance metrics and even... estimating performance!

Covariate shift: both univariate and multivariate covariate shift detection are just like before. Since they only use your model feature values, these actually already worked with regression models! 🙈
Model output drift: check the KS stat and the evolution over time of your regression predictions.
Target drift: check the KS stat and the evolution over time of your regression target values.
Calculating realized performance: easily calculate and plot the performance of your regression model using the following metrics: mae, mape, mse, msle, rmse and rmsle
Estimating performance: estimate the following metrics in absence of target values using our Direct Loss Estimator: mae, mape, mse, msle, rmse and rmsle

The latest produced by our research labs is the Direct Loss Estimator (DLE), allowing us to estimate the performance metrics for your regression model in absence of target values.
To learn more about how it works, check out the in-depth documentation.

This quick snippet demonstrates how to use it:

import nannyml as nml
from IPython.display import display


# Load up our synthetic dataset
reference_df = nml.load_synthetic_car_price_dataset()[0]
analysis_df = nml.load_synthetic_car_price_dataset()[1]
display(reference_df.head(3))

# Create the estimator!
estimator = nml.DLE(
    feature_column_names=[
      'car_age', 'km_driven', 'price_new', 'accident_count', 'door_count', 'fuel', 'transmission'
    ],
    y_pred='y_pred',
    y_true='y_true',
    timestamp_column_name='timestamp',
    metrics=['rmse', 'rmsle'],
    chunk_size=6000
)

# Fit the estimator on reference data
estimator.fit(reference_df)

# Perform the estimation!
results = estimator.estimate(analysis_df)

# Plot results!
for metric in estimator.metrics:
    metric_fig = results.plot(metric=metric, plot_reference=True)
    metric_fig.show()

The introduction of regression made it necessary for us to break some existing interfaces 💔 . Model output drift calculation, target drift calculation and realized performance calculation now require an addition problem_type parameter.
We could have tried to infer this information from the data that is provided, but making it explicit is more transparent and future-proof, even if it comes at a small cost.

As an example, this is how to create a target drift calculator for multiclass classification:

calc = nml.TargetDistributionCalculator(
    y_true='y_true',
    timestamp_column_name='timestamp',
    problem_type='classification_multiclass'
)

Note the new problem_type parameter that can be set to a fixed string value or also accepts a ProblemType enum value.

What's ### changed?

We've had some new people reaching out and helping us improve, for which we are grateful!

Some more consistency in error handling for our very minimal IO classes
Speeding up the tox build steps
Fixing a leak of helper visualization columns into reference results
Refactored a lot of the documentation to use new internal tooling. Speeding up testing and updating the documentation is paramount, it is often the thing we need to spend to most time on before being able to release.

What's up next?

We'll be tackling time itself as we prepare to make the timestamp_column_name data requirement optional!
We hope you'll love the new release as much as we do! Your feedback is most welcome and appreciated!

I would also apologize for the silly puns, but meh, I have no regress. 🥁

Niels

NeoKish · 2022-09-08T16:30:26Z

NeoKish
Sep 8, 2022

@nnansters, Releases isn't updated to v0.6.0? It is still on v0.5.3

1 reply

nnansters Sep 8, 2022
Maintainer Author

Ah, good catch. I had to update some docs so I've sneakily pushed the v0.6.0 tag up the commit log and skipped CI. That must've caused the release to be set in draft status.

Should be published again now! Thanks for reporting! 🥇

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 0.6.0 - Regression! #113

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Release 0.6.0 - Regression! #113

nnansters Sep 7, 2022 Maintainer

Installing / upgrading

What's new?

What's up next?

Replies: 1 comment · 1 reply

NeoKish Sep 8, 2022

nnansters Sep 8, 2022 Maintainer Author

nnansters
Sep 7, 2022
Maintainer

Replies: 1 comment 1 reply

NeoKish
Sep 8, 2022

nnansters Sep 8, 2022
Maintainer Author