Quantcast
Channel: Data Science, Analytics and Big Data discussions - Latest topics
Viewing all articles
Browse latest Browse all 4448

How to remove plural words from the training data for forming bag of words?

$
0
0

@harry wrote:

I am currently studying about the bag of words technique in R and for forming the the words I have use the package tm .But after using it, the training data contains lots of similar words .I want to remove them.

library(jsonlite)
library(dplyr)
library(ggplot2)
library(tm) 

train <- fromJSON("train.json", flatten = TRUE)
ingredients <- Corpus(VectorSource(train$ingredients))
ingredients
 <<VCorpus>>
 Metadata:  corpus specific: 0, document level (indexed): 0
 Content:  documents: 39774

It contains 39774 words in which there are lots of plural words .I want to remove them

Posts: 3

Participants: 3

Read full topic


Viewing all articles
Browse latest Browse all 4448

Trending Articles