Should we remove outliers or not ??
Answer is depends upon statement of the problem because in credit risk analysis these outliers plays a mojor role. likewise in fraud detection these outliers will play a dominant role.
Removing outliers is directly depends upon the dataset because if you see the Titanic dataset, should we keep the outliers or not. Now, this decision will come after like what impact outlier will create on this dataset because in Titanic dataset we are going to decide whether the person will survive or not and in such condition what outliers will create impact that matters the most. It was an accident that has happened and age other factors outliers won't effect the survival so we should remove it but in case of fraud detection we can't remove outliers and for this we have select a model which is not affected by the ouliers.
Another Example: Suppose we are dealing with sales forecasting or stock/crypto analysis and there are sudden spikes on those datsets. Now, those spikes are outliers as those are distributed differently from average distribution within the datsets.Now, should we remove those outliers(spikes) or remove ??
Answer is that we should keep those ouliers(spikes) because these outliers are important factors for the analysis and we have to find the factors for such spikes(outliers) in our analysis. So never ever delete these ouliers.
Similar unusual money transaction is also an outliers in fraud detection dataset but we can't remove those outiers because these outliers(unusual fraud detection)is a factor of analysis.
Contributions are always welcome!
https://scikit-learn.org/stable/modules/outlier_detection.html
If you have any feedback, please reach out to us at [email protected].
| Python Engineer | Machine Learning Engineer | Deep Learning Enthusiasts | Analyst | Electrical & Electronics Engineer | On the Way to Full Stack Developer....
https://github.com/Sengarofficial
The Unlicense