Quantcast
Channel: Data Science, Analytics and Big Data discussions - Latest topics
Viewing all articles
Browse latest Browse all 4448

Too many False Postives with Unbalanced Data

$
0
0

@mahawaseem wrote:

Hi!

I am trying to predict customer churn in a telco company, using R.The dataset is very unbalanced, the target is around 0.6% of the base.

  • 8,746 Customers will Churn
  • 1,396,664 Customers do not churn

I have trained a Random Forest in R.Prior to training, I SMOTE the training data:

train.smote <- SMOTE(Churn~ ., train, perc.over = 100, perc.under=200

This gives me a 1:1 Balance. I then train the forest using:

fit<- randomForest(as.factor(Churn)~.,data=train.smote,importance=TRUE,
ntree=500)

When I run,
pred=predict(fit,newdata=test,type=“class”)
on my validation Data, I get the following Confusion Matrix:

	         Positive	 Negative
Positive 	 1,136,610 	 234,625 
Negative	 3,762 	         5,911 

The F Score is 0.83, the Specificity is 0.61.
However, the number of False Positives is too high (234,625).
Please suggest a method to curb these False Positives without compromising on the True Positives.

Thanks!

Posts: 3

Participants: 2

Read full topic


Viewing all articles
Browse latest Browse all 4448

Trending Articles