Quantcast
Channel: Data Science, Analytics and Big Data discussions - Latest topics
Viewing all articles
Browse latest Browse all 4448

Advantages of One-Hot-Coding for GBM or XGBoost

$
0
0

@Aarshay wrote:

Hi AVians,

Many people are confused with this topic (including me) and I would like to discuss this further.

The issue is that people say that tree-based models have the capability to extract individual categories on their own and there should be no need for one-hot-coding.

I personally prefer one-hot-coding because:

  1. If not separated, a tree will consider all the categories every time the variable comes is randomly selected for a split. So it might end up putting a higher emphasis on the most important categories and ignoring the rest.
  2. If we make separate variables, there will be splits where the most important categories will not be selected and the model will derive insights from less important categories as well.
  3. Since the whole idea behind boosting is combination of weak learners, this should work better.

Also, I have found one-hot-coding to generate better results. But I am not 100% if this works in all the cases.

Please share your thoughts and experience.

@SRK, @kunal, @Nalin, @aayushmnit, @binga, @vikash - pls comment..

Thanks,
Aarshay

Posts: 3

Participants: 2

Read full topic


Viewing all articles
Browse latest Browse all 4448

Trending Articles