doc update

NannyML · Jan 22, 2024 · 57c67a4 · 57c67a4
1 parent b545ac1
commit 57c67a4
Show file tree

Hide file tree

Showing 3 changed files with 22 additions and 24 deletions.
diff --git a/docs/how_it_works/multivariate_drift.rst b/docs/how_it_works/multivariate_drift.rst
@@ -154,28 +154,26 @@ tutorial.
 Classifier for Drift Detection
 ------------------------------
 
-Classifier for drift detection is an implementation of domain classifiers, as it in called
+Classifier for drift detection is an implementation of domain classifiers, as it is called
 in `relevant literature`_. NannyML uses a LightGBM classifier to distinguish between
 the reference data and the examined chunk data. Similar to data reconstruction with PCA
 this method is also able to capture complex changes in our data. The algorithm implementing 
 Classifier for Drift Detection follows the steps described below.
 
-It's important to note that the process described below is repeated for each :term:`Data Chunk`.
-The first step is data preparation. We assign label 0 to reference data and label 1 to chunk data.
+Please note that the process described below is repeated for each :term:`Data Chunk`.
+First, we prepare the data by assigning label 0 to reference data and label 1 to chunk data.
 We use the model inputs as features and concatenate the reference and chunk data.
-Duplicate rows are removed once, keeping the one coming from the chunk data. That is so
-when we are estimating on reference data we get meaningful results. Subsequently
-categorical data are encoded, as integers, since that works well with LightGBM.
-
-To assess the domain classifier's discrimination performance we are using
-it's cross valdated AUROC performance. We do so with the following steps.
-An optional preparation step is to do hyperparameter tuning. We are performing
-hyperparameter optimization once, on the combined data and store
-the resulting optimal hyperparameters. Hyperparameters can also be provided
-by the user. If nothing is specified LightGBM defaults are used.
-The next step uses sklearn's `StratifiedKFold` to split the data. For each fold split we
-train a `LGBMClassifier` and save it's predicted score in the validation fold.
-We then use the predictions across all folds to calculate the resulting AUROC score.
+Duplicate rows are removed once, keeping the one coming from the chunk data.
+This ensures that when we estimate on reference data, we get meaningful results.
+Finally, categorical data are encoded as integers, since this works well with LightGBM.
+
+To evaluate the domain classifier's discrimination performance, we use its cross-validated AUROC score.
+We follow these steps to do so: First, we optionally perform hyperparameter tuning.
+We perform hyperparameter optimization once on the combined data and store the resulting optimal hyperparameters.
+Users can also provide hyperparameters. If nothing is specified, LightGBM defaults are used.
+Next, we use sklearn's `StratifiedKFold`  to split the data. For each fold split,
+we train an `LGBMClassifier` and save its predicted score in the validation fold.
+Finally, we use the predictions across all folds to calculate the resulting AUROC score
 
 The higher the AUROC score the easier it is to distinguish the datasets, hence the
 more different they are.

diff --git a/docs/tutorials/detecting_data_drift/multivariate_drift_detection/cdd.rst b/docs/tutorials/detecting_data_drift/multivariate_drift_detection/cdd.rst
@@ -5,7 +5,7 @@ Classifier for Drift Detection
 ==============================
 
 The second multivariate drift detection method of NannyML is Classifier for Drift Detection.
-This method trains a classification model, named discriminator, to differentiate between data from the reference
+This method trains a classification model to differentiate between data from the reference
 dataset and the chunk dataset. Cross Validation is used for training.
 The discriminator's performance, measured by AUROC, on the cross valdated folds is
 the multivariate drift measure. When there is no data drift the datasets
@@ -34,16 +34,15 @@ The method returns a single number, measuring the discrimination capability of t
 Any increase in the discrimination value above 0.5 reflects a change in the structure of the model inputs.
 
 NannyML calculates the discrimination value for the monitored model's inputs, and raises an alert if the
-values get outside the  pre-defined range of `[0.45, 0.65]`. This range can be adjusted by specifying
-a threshold strategy appropriate for the user's data.
+values get outside the  pre-defined range of ``[0.45, 0.65]``. If needed this range can be adjusted by specifying
+a threshold strategy more appropriate for the user's data.
 
 In order to monitor a model, NannyML needs to learn about it from a reference dataset.
 Then it can monitor the data subject to actual analysis, provided as the analysis dataset.
 You can read more about this in our section on :ref:`data periods<data-drift-periods>`.
 
-Let's start by loading some synthetic data provided by the NannyML package and setting it up as our reference
-and analysis dataframes. This synthetic data is for a binary classification model, but multi-class
-classification or regression can be handled in the same way.
+Let's start by loading some synthetic data provided by the NannyML package set it up as our reference and analysis dataframes.
+This synthetic data is for a binary classification model, but multi-class classification can be handled in the same way.
 
 .. nbimport::
     :path: ./example_notebooks/Tutorial - Drift - Multivariate - Classifier for Drift.ipynb
@@ -53,7 +52,7 @@ classification or regression can be handled in the same way.
     :path: ./example_notebooks/Tutorial - Drift - Multivariate - Classifier for Drift.ipynb
     :cell: 2
 
-The :class:`~nannyml.drift.multivariate.classifier_for_drift_dection.calculator.DriftDetectionClassifierCalculator`
+The :class:`~nannyml.drift.multivariate.classifier_for_drift_detection.calculator.DriftDetectionClassifierCalculator`
 module implements this functionality. We need to instantiate it with appropriate parameters:
 
 - **feature_column_names:** A list with the column names of the features we want to run drift detection on.

diff --git a/docs/tutorials/detecting_data_drift/multivariate_drift_detection/pca.rst b/docs/tutorials/detecting_data_drift/multivariate_drift_detection/pca.rst
@@ -33,7 +33,8 @@ values get outside a range defined by the variance in the reference :ref:`data p
 In order to monitor a model, NannyML needs to learn about it from a reference dataset. Then it can monitor the data subject to actual analysis, provided as the analysis dataset.
 You can read more about this in our section on :ref:`data periods<data-drift-periods>`.
 
-Let's start by loading some synthetic data provided by the NannyML package and setting it up as our reference and analysis dataframes. This synthetic data is for a binary classification model, but multi-class classification can be handled in the same way.
+Let's start by loading some synthetic data provided by the NannyML package set it up as our reference and analysis dataframes.
+This synthetic data is for a binary classification model, but multi-class classification can be handled in the same way.
 
 .. nbimport::
     :path: ./example_notebooks/Tutorial - Drift - Multivariate.ipynb