Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 13951

AWS Lambda subprocess module not found error

$
0
0

Good afternoon,

I have been setting up some code for extracting text with fitz library (PyMuPDF).Module has been correctly installed via lambda layer and it is working as expected, but when i try to use the official fitz utils script i get

ModuleNotFoundError: No module named 'fitz'

example code:

def extract_text(pdf_stream):    try:        pdf_doc = fitz.open(stream=pdf_stream, filetype='pdf')        # Save the PDF document to a file        pdf_doc.save('/tmp/file.pdf') #/tmp is a file destination required to save file, everything else is read only in lambda        logger.info("PDF file saved. Running fitzcli.py.")        cmd_args = ["python", "fitzcli.py", "gettext", "-input", "file.pdf", "-output", "tmp/extracted_text.txt", "-mode", "layout"]        subprocess.run(cmd_args, check=True)        with open('extracted_text.txt', 'r') as open_file:            read_file = open_file.read()        # Assuming extract_top_rows function is defined elsewhere in your code        headers_text = extract_top_rows(read_file)        return headers_text    except Exception as e:        logger.error(f"An error occurred: {e}")        raise

link of the scripthttps://github.com/pymupdf/PyMuPDF-Utilities/blob/master/text-extraction/fitzcli.py

i cannot alter code because of the licensing contraints

i have tried copying lambda execution enviroment and run subprocess with that env.

env = os.environ.copy()cmd_args = ["python", "fitzcli.py", "gettext", "-input", "file.pdf", "-output", "tmp/extracted_text.txt", "-mode", "layout"]subprocess.run(cmd_args, check=True, env=env)

was expecting to run subprocess


Viewing all articles
Browse latest Browse all 13951

Trending Articles