- Project Predict Customer Churn of ML DevOps Engineer Nanodegree Udacity
This is the first project of the Machine Learning DevOps Engineer Nanodegree at Udacity. The main objetive is to create a Python package for a machine learning project that follows coding (PEP8) and engineering best practices for implementing software (modular, documented, and tested). The package will also have the flexibility of being run interactively or from the command-line interface (CLI).
This project is based on a Kaggle dataset (click here for more details) about identifying credit card customers that are most likely to churn.
The project structure tree is shown below:
./MLDOE_Predict_Customer_Churn/
|
├── data/ # Store the datasets
│ └── bank_data.csv # Dataset for this problem with 22 columns and 10,127 rows
|
├── images/ # Store the images
│ ├── eda/ # Store the images of the Exploratory Data Analysis
│ └── results/ # Store the images of the modeling process
|
├── logs/ # Store the logs
|
├── models/ # Store the models generated
|
├── .gitignore # Specifies untracked files that Git should ignore
|
├── churn_library.py # Python module with the code refactored in functions
|
├── churn_notebook.ipynb # Jupyter notebook containing the original code that will be refactored
|
├── churn_script_logging_and_tests.py # Python module that runs the tests and generate the logs
|
├── constants.py # Python module with constant values used in the churn_library.py module
|
├── LICENSE # MIT License
|
├── README.md # Readme file of the project
|
└── requirements.txt # Store information about all the libraries used to develop the project
Python Libraries used for modeling process:
- scikit-learn (0.24.1)
- shap (0.40.0)
- joblib (1.0.1)
- pandas (1.2.4)
- numpy (1.20.1)
- matplotlib (3.3.4)
- seaborn (0.11.2)
Python Libraries used for code quality and tests:
- pylint (2.7.4)
- autopep8 (1.5.6)
- pytest (7.1.2)
To install all of the requirements:
$ pip install -r requirements.txt
To run all of the steps of the process
$ ipython churn_library.py
The plots stored:
./MLDOE_Predict_Customer_Churn/
|
├── images/
│ ├── eda/
│ | ├── churn_distribution.png # Churn Distribution
│ | ├── customer_age_distribution.png # Customer Age Distribution
│ | ├── heatmap.png # Heatmap - Correlations
│ | ├── marital_status_distribution.png # Marital Status Distributions
│ | └── total_transaction_distribution.png # Total transactions Distributions
│ └── results/
│ | ├── feature_importances.png # Random Forest Classifier Feature Importanes Plot
│ | ├── logistic_results.png # Logistic Regression Model Report
│ | ├── rfc_results.png # Random Forest Classifier Model Report
│ | └── roc_curve_result.png # ROC Curves of Logistic Regression and Random Forest Classifier
The models stored:
./MLDOE_Predict_Customer_Churn/
|
├── models/
| ├─ logistic_model.pkl # Logistic Regression model
| └─ rfc_model.pkl # Random Forest Classifier model
To test the churn_library run the followin command:
$ ipython churn_script_logging_and_tests.py
The output should be:
=================================================================== test session starts ==================================================================
platform win32 -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0 -- C:\Users\rudim\AppData\Local\Programs\Python\Python38\python.exe
cachedir: .pytest_cache
rootdir: D:\my_stuff\current\sandbox\cursos\udacity\machine_learning_devops_engineer\repos\MLDOE_Predict_Customer_Churn
collected 5 items
churn_script_logging_and_tests.py::TestClassChurnLibrary::test_import PASSED [ 20%]
churn_script_logging_and_tests.py::TestClassChurnLibrary::test_eda PASSED [ 40%]
churn_script_logging_and_tests.py::TestClassChurnLibrary::test_encoder_helper PASSED [ 60%]
churn_script_logging_and_tests.py::TestClassChurnLibrary::test_perform_feature_engineering PASSED [ 80%]
churn_script_logging_and_tests.py::TestClassChurnLibrary::test_train_models PASSED [100%]
============================================================== 5 passed in 242.75s (0:04:02) =============================================================
The log file ./logs/churn_library.log
content should be like:
[2022-08-20 14:57:21] root - INFO - Testing import_data: SUCCESS
[2022-08-20 14:57:23] root - INFO - Testing perform_eda: SUCCESS
[2022-08-20 14:57:23] root - INFO - Testing test_encoder_helper: SUCCESS
[2022-08-20 14:57:23] root - INFO - Testing test_perform_feature_engineering: SUCCESS
[2022-08-20 15:01:23] root - INFO - Testing test_train_models: SUCCESS
The contents of this repository are covered under the MIT License.