Fraud-Detection

Dataset Link: https://drive.google.com/file/d/1HbNxPW3V0ln7OgRc-t-wKtT7HsgFValZ/view?usp=sharing

Fraud detection models

Isolation Forest Algorithm:

The algorithm is based on the fact that anomalies are data points that are few and different. As a result of these properties, anomalies are susceptible to a mechanism called isolation.

This method is highly useful and is fundamentally different from all existing methods. It introduces the use of isolation as a more effective and efficient means to detect anomalies than the commonly used basic distance and density measures. Moreover, this method is an algorithm with a low linear time complexity and a small memory requirement. It builds a good performing model with a small number of trees using small sub-samples of fixed size, regardless of the size of a data set.

How Isolation Forests Work The Isolation Forest algorithm isolates observations by randomly selecting a feature and then randomly selecting a split value between the maximum and minimum values of the selected feature. The logic argument goes: isolating anomaly observations is easier because only a few conditions are needed to separate those cases from the normal observations. On the other hand, isolating normal observations require more conditions. Therefore, an anomaly score can be calculated as the number of conditions required to separate a given observation.

The way that the algorithm constructs the separation is by first creating isolation trees, or random decision trees. Then, the score is calculated as the path length to isolate the observation.

Local Outlier Factor(LOF) Algorithm:

The LOF algorithm is an unsupervised outlier detection method which computes the local density deviation of a given data point with respect to its neighbors. It considers as outlier samples that have a substantially lower density than their neighbors.

The number of neighbors considered, (parameter n_neighbors) is typically chosen 1) greater than the minimum number of objects a cluster has to contain, so that other objects can be local outliers relative to this cluster, and 2) smaller than the maximum number of close by objects that can potentially be local outliers. In practice, such informations are generally not available, and taking n_neighbors=20 appears to work well in general.

Decision Tree Algorithm:

A decision tree builds upon iteratively asking questions to partition data. The aim of the decision tree algorithm is to increase the predictiveness as much as possible at each partitioning so that the model keeps gaining information about the dataset. Randomly splitting the features does not usually give us valuable insight into the dataset. Splits that increase purity of nodes are more informative. The purity of a node is inversely proportional to the distribution of different classes in that node. The questions to ask are chosen in a way that increases purity or decrease impurity.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Data Dictionary.txt		Data Dictionary.txt
Fraud_detection_.ipynb		Fraud_detection_.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fraud-Detection

Fraud detection models

Isolation Forest Algorithm:

Local Outlier Factor(LOF) Algorithm:

Decision Tree Algorithm:

About

Releases

Packages

Languages

hv129/Fraud-Detection

Folders and files

Latest commit

History

Repository files navigation

Fraud-Detection

Fraud detection models

Isolation Forest Algorithm:

Local Outlier Factor(LOF) Algorithm:

Decision Tree Algorithm:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages