Skip to content

Latest commit

 

History

History
42 lines (26 loc) · 2.59 KB

README.md

File metadata and controls

42 lines (26 loc) · 2.59 KB

Fraud Detection

Credit Card fraud detection based on Kaggle dataset. Applied and tested with Clustering, Logistic Regression, Random Forest, and XG BOOST, along with some sampling techniques for balancing the data.

Some Tips

  • Features V1 to V28 are the principal components obtained with PCA, so they are scaled. Only time and amount need to be scaled.
  • The F1-score is a great scoring metric for imbalanced data when more attention is needed on the positives, making it suitable for measuring model performance.
  • The dataset is highly imbalanced, and it is important to take care of overfitting on the Non-Fraud class. The main techniques used were Random Under-sampling and SMOTE for oversampling the minority class.
  • Secondly, be aware that Fraud transactions can be natural outliers compared to Non-Fraud transactions. Be careful about Anomaly detection, especially outlier removal.
  • Be careful about splitting test and train data before applying any sampling techniques. Only apply sampling techniques to the train data.
  • At the end, be cautious about sampling and cross-validation; if not applied correctly, it can cause data leakage.

Clustering with Kmeans

image

image

image

image

Logistic Regression with Random Under Sampling

image

image

Logistic Regression with SMOTE for Oversampling

image

image

Random Forest with SMOTE

image

image

XG BOOST

image image