@harry wrote:
I am currently trying to remove all the missing value from the data So that I can use that data to build a classification model.
My current data str(h) 'data.frame': 614 obs. of 13 variables: $ Loan_ID : Factor w/ 614 levels "LP001002","LP001003",..: 1 2 3 4 5 6 7 8 9 10 ... $ Gender : Factor w/ 3 levels "","Female","Male": 3 3 3 3 3 3 3 3 3 3 ... $ Married : Factor w/ 3 levels "","No","Yes": 2 3 3 3 2 3 3 3 3 3 ... $ Dependents : Factor w/ 5 levels "","0","1","2",..: 2 3 2 2 2 4 2 5 4 3 ... $ Education : Factor w/ 2 levels "Graduate","Not Graduate": 1 1 1 2 1 1 2 1 1 1 ... $ Self_Employed : Factor w/ 3 levels "","No","Yes": 2 2 3 2 2 3 2 2 2 2 ... $ ApplicantIncome : int 5849 4583 3000 2583 6000 5417 2333 3036 4006 12841 ... $ CoapplicantIncome: num 0 1508 0 2358 0 ... $ LoanAmount : int NA 128 66 120 141 267 95 158 168 349 ... $ Loan_Amount_Term : int 360 360 360 360 360 360 360 360 360 360 ... $ Credit_History : int 1 1 1 1 1 1 1 0 1 1 ... $ Property_Area : Factor w/ 3 levels "Rural","Semiurban",..: 3 1 3 3 3 3 3 2 3 2 ... $ Loan_Status : Factor w/ 2 levels "N","Y": 2 1 2 2 2 2 2 1 2 1 ... For example, I have replaced the missing value in Gender variable by all Female value. h$Gender[which(h$Gender=='')]<-'Female' table(h$Gender) Female Male 0 125 489 str(h) 'data.frame': 614 obs. of 13 variables: $ Loan_ID : Factor w/ 614 levels "LP001002","LP001003",..: 1 2 3 4 5 6 7 8 9 10 ... $ Gender : Factor w/ 3 levels "","Female","Male": 3 3 3 3 3 3 3 3 3 3 ... $ Married : Factor w/ 3 levels "","No","Yes": 2 3 3 3 2 3 3 3 3 3 ... $ Dependents : Factor w/ 5 levels "","0","1","2",..: 2 3 2 2 2 4 2 5 4 3 ... $ Education : Factor w/ 2 levels "Graduate","Not Graduate": 1 1 1 2 1 1 2 1 1 1 ... $ Self_Employed : Factor w/ 3 levels "","No","Yes": 2 2 3 2 2 3 2 2 2 2 ... $ ApplicantIncome : int 5849 4583 3000 2583 6000 5417 2333 3036 4006 12841 ... $ CoapplicantIncome: num 0 1508 0 2358 0 ... $ LoanAmount : int NA 128 66 120 141 267 95 158 168 349 ... $ Loan_Amount_Term : int 360 360 360 360 360 360 360 360 360 360 ... $ Credit_History : int 1 1 1 1 1 1 1 0 1 1 ... $ Property_Area : Factor w/ 3 levels "Rural","Semiurban",..: 3 1 3 3 3 3 3 2 3 2 ... $ Loan_Status : Factor w/ 2 levels "N","Y": 2 1 2 2 2 2 2 1 2 1 ... Stil I am getting the 3 level in Gender variable .I want to know why this happens and how I can reduce this to 2 levels.
Posts: 2
Participants: 2