Quantcast
Channel: Data Science, Analytics and Big Data discussions - Latest topics
Viewing all articles
Browse latest Browse all 4448

Lemmatizing dataframe using NLTK

$
0
0

@prakash6654 wrote:

I was trying to lemmatize a dataframe. In that it converts singular into plural. But I also need to find its root word like Blessing->bless, ran->run, reached -> reach

Below is the sample program I tried.

import nltk

w_tokenizer = nltk.tokenize.WhitespaceTokenizer()
lemmatizer = nltk.stem.WordNetLemmatizer()

def lemmatize_text(text):
_ return [lemmatizer.lemmatize(w) for w in w_tokenizer.tokenize(text)]_

df = pd.DataFrame([‘this was cheesy’, ‘she likes these books’, ‘wow this is great blessing’], columns=[‘text’])
print(df)
df[‘text_lemmatized’] = df.text.apply(lemmatize_text)
print(df)

Posts: 1

Participants: 1

Read full topic


Viewing all articles
Browse latest Browse all 4448

Trending Articles