Fraud Detection

Credit Card fraud detection based on Kaggle dataset. Applied and tested with Clustering, Logistic Regression, Random Forest, and XG BOOST, along with some sampling techniques for balancing the data.

Some Tips

Features V1 to V28 are the principal components obtained with PCA, so they are scaled. Only time and amount need to be scaled.
The F1-score is a great scoring metric for imbalanced data when more attention is needed on the positives, making it suitable for measuring model performance.
The dataset is highly imbalanced, and it is important to take care of overfitting on the Non-Fraud class. The main techniques used were Random Under-sampling and SMOTE for oversampling the minority class.
Secondly, be aware that Fraud transactions can be natural outliers compared to Non-Fraud transactions. Be careful about Anomaly detection, especially outlier removal.
Be careful about splitting test and train data before applying any sampling techniques. Only apply sampling techniques to the train data.
At the end, be cautious about sampling and cross-validation; if not applied correctly, it can cause data leakage.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
main.ipynb		main.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fraud Detection

Some Tips

Clustering with Kmeans

Logistic Regression with Random Under Sampling

Logistic Regression with SMOTE for Oversampling

Random Forest with SMOTE

XG BOOST

About

Releases

Packages

Languages

ceenaa/fraud_detection

Folders and files

Latest commit

History

Repository files navigation

Fraud Detection

Some Tips

Clustering with Kmeans

Logistic Regression with Random Under Sampling

Logistic Regression with SMOTE for Oversampling

Random Forest with SMOTE

XG BOOST

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages