Quantcast
Channel: Data Science, Analytics and Big Data discussions - Latest topics
Viewing all articles
Browse latest Browse all 4448

Decision Tree Pruning and other related queries

$
0
0

@ismail18 wrote:

Hi All,

I am currently doing ML practice problem where I need to predict “item_sales” (continuous variable). Feature variable are a mix of continuous and categorical variables. I am following these steps :

  1. Taking all the feature variables
  2. Imputing missing data in continuous variable by mean and in categorical data by mode
  3. One hot encoding categorical variables
  4. Fitting a decision tree regressor and getting prediction and r-score
  5. Since decision trees often overfit, pruning it through hyperparameters tuning of max_depth, min_samples_split etc using gridsearchcv
  6. Getting an improved and robust model

Here are my observations and concerns :

Q1. A continuous variable “item_mrp” is getting a very high relative feature importance compared to others. why so ?

Q2. Does one hot encoding make categorical variables less relevant compared to continuous variables ?

Q3. Should I consider dimensionality reduction to improve robustness and remove overfit ? (but my data does not have many features even after one hot encoding)

Q4. What can we do to build decision trees which give high r-score but are also robust (does not overfit and perform well on unseen data) ?

This question is regarding decision trees so please answer accordingly. Help is very much valued.

Posts: 1

Participants: 1

Read full topic


Viewing all articles
Browse latest Browse all 4448

Trending Articles