Quantcast
Channel: Data Science, Analytics and Big Data discussions - Latest topics
Viewing all articles
Browse latest Browse all 4448

IV Categorical Variable For Logistic Regression

$
0
0

@Blackberry wrote:

Hi ,

I have gone through Logistic regression - i covered almost ROC curve to chose threshold, confusion matrix , AIC,AUC, overall accuracy , sensitivity,specificity ,precision ,recall. Now i am trying to apply this logistic regression on one of my real life problem. But the data set for this problem have almost all categorical. I know we can go with dummy variables.But the problem is using dummy variables will require me to encode categories which have in some variables more than 100 and almost close to 40 categories

Complaint_NO
Compalint_Status         Whether it is "Open","Closed","Withdrawn"
Complaint_SUb_Status    "Progress","Resolved","Satisified"
Complain_Owner           ID of who is handling/assigned this Complaint
Business_Unit            Complaint_Owner belongs to which business unit
Region                   Which Region Complaint Raised
Area                     Which Area Complaint Raised
AGM                      Who is this Area Manager (Area General Manager)
Store                    Name of Store
Product                  What Product user is complaining for- 30 different product
Sub_Product              What Sub Product user is complaining for- 120 different sub product
Complaint_Created        Date of Complaint Lodged
Complaint_Closed         Date of Complaint Closed
Source                   What is the source of Complaint "Local State & Fed MP","Fault Management"
                        ,"State & Federal Govt","XXX Shops","Retail  Channel","Field Staff","NA","BillPay"
                         there are almost 60 source of this Complaint
Complaint_Level          It is esclaltion level of Complaints "Level 0","Level 1","Level 2","Level 3","Level 4"
SR_Days                  Number of days Complaints be Opened
Root_Cause1              What is the main reason for rasing Complaints e.g. "Product Features"
                         (there are almost 20 root cause1)
Root_Cause2              What is the sub reason for raising complaints e.g. "Data Speed/Connection Issues"
                         (there are almost 200 root cause2)
Owned_Entity             Whether this SR belongs to this store or not if yes then "Owned" otherwise "Not Owned"
26+ Days                 (If SR_Days>26 then 1 else 0)<---this is the DV which i want to predict at the time of SR
                         lodged or in progress for some days

My question is using above data set is it appropriate to go with the logistic regression in order to find whether a new complaints takes more than 26+ days to resolve or not (SR_DAYS)?

Or it is better to go with Decision tree , I am familiar with decision tree as well but how can i evaluate the decision tree model?

Please correct me decision tree use the same techniques for model evaluation which logistic regression does e.g. Confusion matrix, ROC curve threshold finding , AUC , Over all accuracy ,Sensitivity , Specificity?

Can i go use logistic regression without encoding categorical variable into dummy variable?

Thanks in advance

Sufyan

Posts: 1

Participants: 1

Read full topic


Viewing all articles
Browse latest Browse all 4448

Trending Articles