I'm trying to use tabula and Pandas concat
in Python to PDF bank statements into a CSV/Xlsx file, so I can automate the task of manually entering them into Excel, however after intense experimentation and testing, it refuses extract more than 1 table (only their column headers).
All the tables in the PDF have the same columns (Date, Balance, etc.), where I just want to stack them vertically and sort them by date.
Here's how I'm trying to extract and concatenating the tables from the PDF as efficiently as possible:
df = pd.concat(tabula.read_pdf(filename, pages='all', multiple_tables=True), axis=0)tabula.convert_into(filename, "Converted Document.csv", output_format="csv", pages='all')
The values for the other tables in the bank statement are being ignored even with
multiple_tables=True
I've looked at other solutions here, but I don't understand what the issue is.