@supra_minion wrote:
Hi
I am working on a data set in R. It required predicting a categorical variable. The output variable has two categories 1 and 0. In XGboost, I've taken num_class parameter as 2.
There are 600 rows in Training Set and 350 rows in test set.
** I am facing multiple issues.**
First Problem
After I run the Xgboost model with cross validation:
xg_model <- xgb.cv(data=data.matrix(dum_train[,-1]), label=x, objective="multi:softprob", nfold = 10, num_class=2, nrounds=200, eta=0.1, subsample=0.5, colsample_bytree=0.5,max_depth=6,min_child_weight=1,eval_metric="merror", prediction=T)
The result shows up like this:
[179] train-merror:0.000000+0.000000 test-merror:0.000000+0.000000
[180] train-merror:0.000000+0.000000 test-merror:0.000000+0.000000
[181] train-merror:0.000000+0.000000 test-merror:0.000000+0.000000
[182] train-merror:0.000000+0.000000 test-merror:0.000000+0.000000
[183] train-merror:0.000000+0.000000 test-merror:0.000000+0.000000
[184] train-merror:0.000000+0.000000 test-merror:0.000000+0.000000
[185] train-merror:0.000000+0.000000 test-merror:0.000000+0.000000
[186] train-merror:0.000000+0.000000 test-merror:0.000000+0.000000
[187] train-merror:0.000000+0.000000 test-merror:0.000000+0.000000
[188] train-merror:0.000000+0.000000 test-merror:0.000000+0.000000
[189] train-merror:0.000000+0.000000 test-merror:0.000000+0.000000
[190] train-merror:0.000000+0.000000 test-merror:0.000000+0.000000
[191] train-merror:0.000000+0.000000 test-merror:0.000000+0.000000
[192] train-merror:0.000000+0.000000 test-merror:0.000000+0.000000
[193] train-merror:0.000000+0.000000 test-merror:0.000000+0.000000
[194] train-merror:0.000000+0.000000 test-merror:0.000000+0.000000
[195] train-merror:0.000000+0.000000 test-merror:0.000000+0.000000
[196] train-merror:0.000000+0.000000 test-merror:0.000000+0.000000
[197] train-merror:0.000000+0.000000 test-merror:0.000000+0.000000
[198] train-merror:0.000000+0.000000 test-merror:0.000000+0.000000
[199] train-merror:0.000000+0.000000 test-merror:0.000000+0.000000Question 1: Does this validation result suggest I am over-fitting too much ? If yes, what can I do to avoid over-fitting ?
Second Problem
After running this model, I predicted values on my test set. As mentioned above, my test set has 350 rows, I expect the predicted values from model to be 350. But, the predicted values I get is 700. Double the number of values in test set.
*Question 2: Why is this happening ? What am I doing wrong here ?
Posts: 2
Participants: 2