Quantcast
Channel: Data Science, Analytics and Big Data discussions - Latest topics
Viewing all articles
Browse latest Browse all 4448

Splitting the Data in KFold CV according to label

$
0
0

@ravi_6767 wrote:

Hi all,
I was going through the documentation of KFold CV in sklearn and following structure is used for implementing it :
class sklearn.cross_validation.KFold(n, n_folds=3, shuffle=False, random_state=None)

I am aware that splitting will be done sequentially or randomly depending on shuffle and random_state parameters. I want to know that, is there a method that will split the data into train and test having equal distribution or proportion of 1's and 0's in the label?
Say in the data set label has the distribution like :

[1,1,1,0,0,0,1,1,1]

so the desirable splitting will be :

train : [1,1,1,,0,0,1]
test : [0,1,1]

The ratio of 1's and 0's (i.e, 2:1) is mantained in the data set. I want to add one more question, doing this type of splitting while forming k-folds is effective or not?
Thanks in advance

Posts: 2

Participants: 2

Read full topic


Viewing all articles
Browse latest Browse all 4448

Trending Articles