Quantcast
Channel: Data Science, Analytics and Big Data discussions - Latest topics
Viewing all articles
Browse latest Browse all 4448

Variable Selection before or after cross validation python

$
0
0

@psnh wrote:

I have a high dimensional dataset for classification - 1500 features and 45000 data point.
My initial approach for modeling was:

  1. Divide the dataset in training and testing
  2. Perform variable selection on training dataset
  3. Create a new dataset with only relevant features and perform cross validation
  4. Validate the model against the testing dataset

I am not sure if my approach is correct. I read online that variable selection should not be performed before cross validation but performing cross validation on a dataset with 1500 takes a lot of time.

I am not sure if my approach is correct and would really appreciate any input on this!
Thanks!

Posts: 1

Participants: 1

Read full topic


Viewing all articles
Browse latest Browse all 4448

Trending Articles