Quantcast
Viewing all articles
Browse latest Browse all 4448

How to decide which technique to use to treat outliers?

@aqsa2 wrote:

In my mind, there are multiple ways to treat dataset outliers

-> Delete data
-> Transforming using log or Bin
-> using mean  median
-> Test separately

I have a dataset of around 50000 observations and each observation has quite some outlier values (some variable have small amount of outliers some has 100-200 outliers) so excluding data is not the one I’m looking for as it causing me to loose a huge chunk of data.

I read somewhere that using mean and median is for artificial outliers but in my case I think the outliers are Natural

Image may be NSFW.
Clik here to view.
enter image description here

I was actually about to use median to get rid of the outliers and then using mean to fill in missing values but it doesn’t seem ok now. So I’m really confuse what technique to use as the data is giving me 97% accuracy right now because of all the outliers. Should I bin the data present in each column from 1-10? Also should I normalize or standardize my data before applying any model ? Any guidance will be appreciated

Posts: 1

Participants: 1

Read full topic


Viewing all articles
Browse latest Browse all 4448

Trending Articles