@Meraki09 wrote:
I am having a dataset with 4669 observation and 15 variables.
I am trying to predict if the product will be bought by a customer or not
With my recent data, the output variable has “Yes”, “NO” and “”.
I tried the below, random forest code and it worked perfectly on my data without any errors.
library(randomForest)
outputvar <- c(“Yes”, “NO”, “Yes”, “NO”, “” , “” )
inputvar1 <- c(“M”, “M”, “F”, “F”, “M”, “F”)
inputvar2 <- c(“34”, “35”, “45”, “60”, “34”, “23”)
data <- data.frame(cbind(outputvar, inputvar1, inputvar2))
data$outputvar <- factor(data$outputvar, exclude = “”)
ind0 <- sample(2, nrow(data), replace = TRUE, prob = c(0.7,0.3))
train0 <- data[ind0==1, ]
test0 <- data[ind0==2, ]fit1 <- randomForest(outputvar~., data=train0, na.action = na.exclude)
print(fit1)
plot(fit1)
p1 <- predict(fit1, train0)
fit1$confusionp2 <- predict(fit1, test0)
t <- table(prediction = p2, actual = test0$outputvar)
tAs you could notice I have split my training and test data into 70% (3228)and 30 %. (1441)
during my analysis with confusion matrix on my test data , I could see it has predicted only for 1417 rows missing the 48 rows in the data set.
I stored the predicted values(p2) in a dataframe and compared with the test0 data frame . There I could see that all those NA are filled with Yes and NO .Now I am confused, why my confusion matrix results with wrong number of rows.
and are those filed values In my Predicted dataset(p2) correct one ?
or I am misleading somewhere
I am new to this field. and any information would be helpful.
Posts: 1
Participants: 1