Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 16566

Python Convert PDF to XLSX

$
0
0

Can anyone of you help me convert a PDF file that contains tables to XLSX?

PDF original

1

Invalid URL

2

from path_name_mappings import destination_filed_xlsxfrom path_name_mappings import destination_path_pdfimport tabulaimport pandas as pdimport osdef pdf_to_excel(pdf_file_path, excel_file_path):    tables = tabula.read_pdf(pdf_file_path, pages='all', multiple_tables=True)    with pd.ExcelWriter(excel_file_path, engine='xlsxwriter') as writer:        for i, table in enumerate(tables):            table = table.rename(columns={'Unnamed: 0': 'Godzina'})            sheet_name = f'Sheet{i+1}'            table.to_excel(writer, sheet_name=sheet_name, index=False)def process_all_pdfs(pdf_folder, excel_folder):    for pdf_file in os.listdir(pdf_folder):        if pdf_file.endswith(".pdf"):            pdf_file_path = os.path.join(pdf_folder, pdf_file)            pdf_base_name = os.path.splitext(pdf_file)[0]            excel_file_path = os.path.join(excel_folder, pdf_base_name +".xlsx")            pdf_to_excel(pdf_file_path, excel_file_path)# Example usageprocess_all_pdfs(destination_path_pdf, destination_filed_xlsx)

Can anyone help me correct the code to correctly export PDF files to XLSX while maintaining the PDF format? This is very important for me to continue working on the project


Viewing all articles
Browse latest Browse all 16566

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>