Quantcast
Channel: Data Science, Analytics and Big Data discussions - Latest topics
Viewing all articles
Browse latest Browse all 4448

Why does the rare words have more weight in Inverse Document Frequency representation?

$
0
0

@syed.danish wrote:

Hi,
I am going through the Term Frequency and Inverse Document Frequency representations used in bag of words technique in sklearn, There are two kind of representation that are available, one will tell the frequency of the word in the phrase(TF) and the other will tell about the frequency of the word in whole document (IDF).
My question is why do we weight rare words more in case of IDF representation? Aren't they supposed to be some sort of outliers?
Thanks in advance

Posts: 2

Participants: 2

Read full topic


Viewing all articles
Browse latest Browse all 4448

Trending Articles