Quantcast
Channel: Data Science, Analytics and Big Data discussions - Latest topics
Viewing all articles
Browse latest Browse all 4448

Integer Categorical feature as both Numeric and Categorical

$
0
0

I’m dealing with tabular datasets where it’s really hard to tell if the integer column is Numeric or Categorical. My main consideration is the accuracy of the model that I am building (no deep learning). Thus, I’m wondering if I can treat the integer column as both Numeric (use as it is) and Categorical (do one-hot encoding or use a decision tree with set-based split). i.e. give both representations of the column at the same time and let the model figure out the suitable features.

My question is: Are there any scenarios where doing this multiple representation approach makes sense or does not make sense? And if so, how does it relate to the model you are training and the bias-variance tradeoff? For instance, Logistic (high bias) vs Random Forest (high variance). Are there any established theories or best practices out there that show the advantage/disadvantage of doing this? I’m asking this question in the context of classification problems.

2 posts - 2 participants

Read full topic


Viewing all articles
Browse latest Browse all 4448

Trending Articles