@harry wrote:
I am currently studying about the bag of words technique in R and for forming the the words I have use the package tm .But after using it, the training data contains lots of similar words .I want to remove them.
library(jsonlite) library(dplyr) library(ggplot2) library(tm) train <- fromJSON("train.json", flatten = TRUE) ingredients <- Corpus(VectorSource(train$ingredients)) ingredients <<VCorpus>> Metadata: corpus specific: 0, document level (indexed): 0 Content: documents: 39774
It contains 39774 words in which there are lots of plural words .I want to remove them
Posts: 3
Participants: 3