Quantcast
Channel: Data Science, Analytics and Big Data discussions - Latest topics
Viewing all articles
Browse latest Browse all 4448

TypeError: doc2bow expects an array of unicode tokens on input, not a single string

$
0
0

@xxxsl wrote:

Hi, I was trying out a guide in topic modelling in python. And i went across this blog posts from analytics vidhya https://www.analyticsvidhya.com/blog/2016/08/beginners-guide-to-topic-modeling-in-python/.

I encountered this problem, and not sure how to interpret the error message.

Traceback (most recent call last):
File “Topic.py”, line 15, in
doc_term_matrix = [dictionary.doc2bow(doc) for doc in text]
File “C:\Python27\lib\site-packages\gensim\corpora\dictionary.py”, line 233, in doc2bow
raise TypeError(“doc2bow expects an array of unicode tokens on input, not a single string”)
TypeError: doc2bow expects an array of unicode tokens on input, not a single string

Here is my code.

import gensim
from gensim import corpora

file = open(‘document.txt’)
text = file.read()

dictionary = corpora.Dictionary(text)

doc_term_matrix = [dictionary.doc2bow(doc) for doc in text]

Lda = gensim.models.ldamodel.LdaModel

ldamodel = Lda(doc_term_matrix, num_topics=3, id2word = dictionary, passes=50)

Posts: 1

Participants: 1

Read full topic


Viewing all articles
Browse latest Browse all 4448

Trending Articles