@syed.danish wrote:
Hi,
I am going through the Term Frequency and Inverse Document Frequency representations used in bag of words technique in sklearn, There are two kind of representation that are available, one will tell the frequency of the word in the phrase(TF) and the other will tell about the frequency of the word in whole document (IDF).
My question is why do we weight rare words more in case of IDF representation? Aren't they supposed to be some sort of outliers?
Thanks in advance
Posts: 2
Participants: 2