@sree1986 wrote:
Hi I am working on a logistic regression based binary classification problem where I need predict customer churn. Some categorical variables in the data-set have a large no of levels like area(75 levels), district(135 levels), sub area(180 levels) etc. Creating dummy variables doesn’t make sense as the no of columns will explode then. Is there anyway we can handle such deep categorical variables ? Also, keeping both ‘area’ & ‘sub-area’ seems redundant as a sub-area will belong to an area. If so, does it make sense to remove the ‘area’ variable ?
Thanks in advance
Posts: 1
Participants: 1