Quantcast
Viewing all articles
Browse latest Browse all 4448

Tfidf on sklearn library is giving me a huge file and memory error

@drdeath91 wrote:

I sm currently applying tfidf through python sklearn library. When I apply learn function on dataset which contain one million rows of news articles (averaging 400 words each row) and title (averaging 100 words). I have output of 100gb (that too on 500k entries not on million) training file which i guess is pretty huge. I have seen post where people applied tfidf on many million of articles. Apart from thar it gave me memory error since all my ram and swapped space is consumed (32gb ram + 120gb swap). Anyone with experience with iftdf kindly guide me am i doing something wrong(which i suppose i am). What will be the possible issues and how to resolve it)

Posts: 1

Participants: 1

Read full topic


Viewing all articles
Browse latest Browse all 4448

Trending Articles