@SD1 wrote:
I have a lot of email addresses in the corpus I am analyzing and I need to get rid of them.
Can someone help me with the function in R that will help me to accomplish this task?
I think the code which you provide would precede the line where I have removed special characters in the code below.
atac_corpus_bm <- Corpus(VectorSource(atac_bm$content)) corpus_clean_bm <- tm_map(atac_corpus_bm, content_transformer(tolower)) #Code for removing email address corpus_clean_bm <- tm_map(corpus_clean_bm, removeSpecialChars) removeSpecialChars <- function(x) gsub("[^a-zA-Z0-9 ]","",x) corpus_clean_bm <- tm_map(corpus_clean_bm, removeSpecialChars) corpus_clean_bm <- tm_map(corpus_clean_bm, removeWords, c(stopwords('english')) corpus_clean_bm <- tm_map(corpus_clean_bm, removeNumbers) corpus_clean_bm <- tm_map(corpus_clean_bm, removePunctuation) stop = read.table("stop.txt", header = TRUE) stop_vec = as.vector(stop$stop) corpus_clean_bm <- tm_map(corpus_clean_bm, removeWords, c(stop_vec, stopwords('english'))) corpus_clean_bm <- tm_map(corpus_clean_bm, stripWhitespace)
Thanks.
Regards,
SD
Posts: 1
Participants: 1