Quantcast
Channel: Data Science, Analytics and Big Data discussions - Latest topics
Viewing all articles
Browse latest Browse all 4448

Confusion Matrix in Random Forest R with NA in output varaible

$
0
0

@Meraki09 wrote:

I am having a dataset with 4669 observation and 15 variables.

I am trying to predict if the product will be bought by a customer or not

With my recent data, the output variable has “Yes”, “NO” and “”.

I tried the below, random forest code and it worked perfectly on my data without any errors.

library(randomForest)

outputvar <- c(“Yes”, “NO”, “Yes”, “NO”, “” , “” )
inputvar1 <- c(“M”, “M”, “F”, “F”, “M”, “F”)
inputvar2 <- c(“34”, “35”, “45”, “60”, “34”, “23”)
data <- data.frame(cbind(outputvar, inputvar1, inputvar2))
data$outputvar <- factor(data$outputvar, exclude = “”)
ind0 <- sample(2, nrow(data), replace = TRUE, prob = c(0.7,0.3))
train0 <- data[ind0==1, ]
test0 <- data[ind0==2, ]

fit1 <- randomForest(outputvar~., data=train0, na.action = na.exclude)
print(fit1)
plot(fit1)
p1 <- predict(fit1, train0)
fit1$confusion

p2 <- predict(fit1, test0)

t <- table(prediction = p2, actual = test0$outputvar)
t

As you could notice I have split my training and test data into 70% (3228)and 30 %. (1441)
during my analysis with confusion matrix on my test data , I could see it has predicted only for 1417 rows missing the 48 rows in the data set.
I stored the predicted values(p2) in a dataframe and compared with the test0 data frame . There I could see that all those NA are filled with Yes and NO .

Now I am confused, why my confusion matrix results with wrong number of rows.

and are those filed values In my Predicted dataset(p2) correct one ?

or I am misleading somewhere

I am new to this field. and any information would be helpful.

Posts: 1

Participants: 1

Read full topic


Viewing all articles
Browse latest Browse all 4448

Trending Articles