Quantcast
Channel: Data Science, Analytics and Big Data discussions - Latest topics
Viewing all articles
Browse latest Browse all 4448

How to read a bunch of .docx files in Python?

$
0
0

@vikas_10 wrote:

Hello everyone,
I am working on parsing a bunch of .docx files in python. I have saved all the .docx files in a folder named .docx_files. I want to make a list docx_list whose each element is one docx file saved in my .docx_files folder. To do this, I am using python-docx library. I created a list which contains paths of all the .docx files using os.walk() method. The list looks as given below
['C:/Users/HP/Desktop/.docx files\AbhijeetPawar3_7.docx',
'C:/Users/HP/Desktop/.docx files\AbhijeetPolkamwar[0_0].docx',
'C:/Users/HP/Desktop/.docx files\AbhijeetRikame[6_0].docx']
I passed this list to Document() method to get the required list. The code looks as below

for path in document_list:
resume_list = Document(path)

Now, when I printed the list by using the following code

for para in resume_list.paragraphs:
print(para.text)

I can see only the document corresponding to the last path in my list. Can anyone help me find out what is going wrong? Also, suggest how to fix it.

Posts: 1

Participants: 1

Read full topic


Viewing all articles
Browse latest Browse all 4448

Trending Articles