Quantcast
Viewing all articles
Browse latest Browse all 4448

Code to remove email addresses from corpus in R

@SD1 wrote:

I have a lot of email addresses in the corpus I am analyzing and I need to get rid of them.

Can someone help me with the function in R that will help me to accomplish this task?

I think the code which you provide would precede the line where I have removed special characters in the code below.

atac_corpus_bm <- Corpus(VectorSource(atac_bm$content))
corpus_clean_bm <- tm_map(atac_corpus_bm, content_transformer(tolower))
#Code for removing email address
corpus_clean_bm <- tm_map(corpus_clean_bm, removeSpecialChars)
removeSpecialChars <- function(x) gsub("[^a-zA-Z0-9 ]","",x)
corpus_clean_bm <- tm_map(corpus_clean_bm, removeSpecialChars)
corpus_clean_bm <- tm_map(corpus_clean_bm, removeWords, c(stopwords('english'))
corpus_clean_bm <- tm_map(corpus_clean_bm, removeNumbers)
corpus_clean_bm <- tm_map(corpus_clean_bm, removePunctuation)
stop = read.table("stop.txt", header = TRUE)
stop_vec = as.vector(stop$stop)
corpus_clean_bm <- tm_map(corpus_clean_bm, removeWords, c(stop_vec, stopwords('english')))
corpus_clean_bm <- tm_map(corpus_clean_bm, stripWhitespace)

Thanks.
Regards,
SD

Posts: 1

Participants: 1

Read full topic


Viewing all articles
Browse latest Browse all 4448

Trending Articles