@SD1 wrote:
Hi,
Another query on text mining.I found a piece of code that would help me in randomly splitting a data frame into 70%-30%. The code ran successfully.
dt=sort(sample(nrow(atac_raw),nrow(atac_raw)*.7))
atac_raw_train <- atac_raw[dt,]
atac_raw_test <- atac_raw[-dt,]However, when I use the same code to split the corresponding corpus data (corpus_clean), it fails. Maybe, the code doesn't work on corpus data?
dt_corpus=sort(sample(nrow(corpus_clean),nrow(corpus_clean)*.7))
*Error in sample.int(length(x), size, replace, prob) : invalid 'size' argument
Can anyone help? Couldn't find any solution on the web.
I can think of a workaround (jugaad!) by modifying the datafile in such a way so that I select the first n records as training and the remaining as my testing data. But, would like to know if there is a way to fix the code instead to make it work.
Regards,
SD
Posts: 1
Participants: 1