Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 18732

Error when trying to extract text from word using python?

$
0
0

I'm currently trying to write a function in Python that will allow me to extract text from .docx files. For this I use the python-docx library. My program also does what it's supposed to do, at least when I create a docx file in Python and then use my function on this file it returns the text to me.

However, for .docx files (word documents) that I have modified or created, it cannot find the path and returns PackageNotFoundError. I came across the Internet to check whether my file is a zip file. I did this with zipfile and in fact my saved word documents are not zipfiles. What's going on? My python code again for verification:

from zipfile import is_zipfileimport docx

doc = docx.Document()

doc.add_paragraph("Hello")

doc.save(test_path)

print(is_zipfile(test_path))

//output = true

If I then go into this test_path, type a number and save ->

print(is_zipfile(test_path))//output = false

Are modern .docx documents no longer zip files? Or what wrong here?

When googling everywhere is written that word documents/.docx files are zip files. I think that is the problem why the libary gives me the error code and cannot open the file.I appreciate everyone trying to help. Thanks


Viewing all articles
Browse latest Browse all 18732

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>