🧑🏫 Author: Nhi Yen
💡I write about Machine Learning on Medium || Github || Kaggle || Linkedin. If you found this article interesting, your support by giving me ⭐ will help me spread the knowledge to others.
This project aims to create a model to detect fraudulent transactions in credit card transactions.
- Python 3.x
- Libraries: pandas, numpy, matplotlib, seaborn, sklearn, scipy, imblearn
- Clone the repository
git clone https://github.com/username/project.git
- Install required libraries
pip install pandas numpy matplotlib seaborn sklearn scipy imblearn
Run the following command in the terminal to execute the project:
python main.py
- Loading data
- Printing random sample of 10 rows to check data loading
- Printing data overview
- Printing numerical summary for Time and Amount columns
- Plotting distribution of Time feature
- Plotting distribution of Amount feature
- Counting number of fraud vs non-fraud transactions and displaying them with their ratio
- Plotting count of fraud vs non-fraud transactions in a bar chart
- Plotting heatmap to find any high correlations between variables
- Drop the 'Class' column to prepare data for splitting
- Get the target variable
- Split data into training, validation and test sets, ensuring the class distribution is maintained
- Initialize the
StandardScaler
object and fit it to the training data - Scale the training, validation, and test sets using the scaler
Undersampling
will be utilized to address the issue of imbalanced classes.
- Run CV with 5 folds (logit)
- Instantiate RandomUnderSampler
- Fit a Naive Bayes Model
- Kaggle Dataset: Credit Card Fraud Detection
- Github Repo - HERE
- Kaggle Project - HERE
- Detail Explanation about the code on MEDIUM
This project is licensed under the MIT License.