Quantcast
Channel: Data Science, Analytics and Big Data discussions - Latest topics
Viewing all articles
Browse latest Browse all 4448

How to randomly split corpus data?

$
0
0

@SD1 wrote:

Hi,
Another query on text mining.

I found a piece of code that would help me in randomly splitting a data frame into 70%-30%. The code ran successfully.

dt=sort(sample(nrow(atac_raw),nrow(atac_raw)*.7))
atac_raw_train <- atac_raw[dt,]
atac_raw_test <- atac_raw[-dt,]

However, when I use the same code to split the corresponding corpus data (corpus_clean), it fails. Maybe, the code doesn't work on corpus data?

dt_corpus=sort(sample(nrow(corpus_clean),nrow(corpus_clean)*.7))

*Error in sample.int(length(x), size, replace, prob) : invalid 'size' argument

Can anyone help? Couldn't find any solution on the web.

I can think of a workaround (jugaad!) by modifying the datafile in such a way so that I select the first n records as training and the remaining as my testing data. But, would like to know if there is a way to fix the code instead to make it work.

Regards,
SD

Posts: 1

Participants: 1

Read full topic


Viewing all articles
Browse latest Browse all 4448

Trending Articles