Skip to content

Predict Customer Churn with Clean Code, first project of the Udacity's Machine Learning DevOps Engineer Nanodegree.


Notifications You must be signed in to change notification settings


Repository files navigation

Predict Customer Churn

  • Project Predict Customer Churn of ML DevOps Engineer Nanodegree Udacity

Project Description

This is the first project of the Machine Learning DevOps Engineer Nanodegree at Udacity. The main objetive is to create a Python package for a machine learning project that follows coding (PEP8) and engineering best practices for implementing software (modular, documented, and tested). The package will also have the flexibility of being run interactively or from the command-line interface (CLI).

This project is based on a Kaggle dataset (click here for more details) about identifying credit card customers that are most likely to churn.

Files and data description

The project structure tree is shown below:

├── data/                                       # Store the datasets
│   └── bank_data.csv                           # Dataset for this problem with 22 columns and 10,127 rows
├── images/                                     # Store the images
│   ├── eda/                                    # Store the images of the Exploratory Data Analysis
│   └── results/                                # Store the images of the modeling process
├── logs/                                       # Store the logs
├── models/                                     # Store the models generated
├── .gitignore                                  # Specifies untracked files that Git should ignore
├──                            # Python module with the code refactored in functions
├── churn_notebook.ipynb                        # Jupyter notebook containing the original code that will be refactored
├──           # Python module that runs the tests and generate the logs
├──                                # Python module with constant values used in the module
├── LICENSE                                     # MIT License
├──                                   # Readme file of the project
└── requirements.txt                            # Store information about all the libraries used to develop the project


Python Libraries used for modeling process:

  • scikit-learn (0.24.1)
  • shap (0.40.0)
  • joblib (1.0.1)
  • pandas (1.2.4)
  • numpy (1.20.1)
  • matplotlib (3.3.4)
  • seaborn (0.11.2)

Python Libraries used for code quality and tests:

  • pylint (2.7.4)
  • autopep8 (1.5.6)
  • pytest (7.1.2)

To install all of the requirements:

$ pip install -r requirements.txt

Running Files

To run all of the steps of the process

$ ipython

The plots stored:

├── images/
│   ├── eda/
│   |   ├── churn_distribution.png              # Churn Distribution
│   |   ├── customer_age_distribution.png       # Customer Age Distribution
│   |   ├── heatmap.png                         # Heatmap - Correlations
│   |   ├── marital_status_distribution.png     # Marital Status Distributions
│   |   └── total_transaction_distribution.png  # Total transactions Distributions
│   └── results/
│   |   ├── feature_importances.png             # Random Forest Classifier Feature Importanes Plot
│   |   ├── logistic_results.png                # Logistic Regression Model Report
│   |   ├── rfc_results.png                     # Random Forest Classifier Model Report
│   |   └── roc_curve_result.png                # ROC Curves of Logistic Regression and Random Forest Classifier

The models stored:

├── models/
|   ├─ logistic_model.pkl                       # Logistic Regression model
|   └─ rfc_model.pkl                            # Random Forest Classifier model

To test the churn_library run the followin command:

$ ipython

The output should be:

=================================================================== test session starts ==================================================================
platform win32 -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 -- C:\Users\rudim\AppData\Local\Programs\Python\Python38\python.exe
cachedir: .pytest_cache
rootdir: D:\my_stuff\current\sandbox\cursos\udacity\machine_learning_devops_engineer\repos\MLDOE_Predict_Customer_Churn
collected 5 items PASSED                                                                        [ 20%] PASSED                                                                           [ 40%] PASSED                                                                [ 60%] PASSED                                                   [ 80%] PASSED                                                                  [100%]

============================================================== 5 passed in 242.75s (0:04:02) =============================================================

The log file ./logs/churn_library.log content should be like:

[2022-08-20 14:57:21] root - INFO - Testing import_data: SUCCESS
[2022-08-20 14:57:23] root - INFO - Testing perform_eda: SUCCESS
[2022-08-20 14:57:23] root - INFO - Testing test_encoder_helper: SUCCESS
[2022-08-20 14:57:23] root - INFO - Testing test_perform_feature_engineering: SUCCESS
[2022-08-20 15:01:23] root - INFO - Testing test_train_models: SUCCESS


The contents of this repository are covered under the MIT License.


Predict Customer Churn with Clean Code, first project of the Udacity's Machine Learning DevOps Engineer Nanodegree.







No packages published