Quantcast
Viewing all articles
Browse latest Browse all 14069

Converting PDF bank statements to Xlsx/CSV file

I'm trying to use tabula and Pandas concat in Python to PDF bank statements into a CSV/Xlsx file, so I can automate the task of manually entering them into Excel, however after intense experimentation and testing, it refuses extract more than 1 table (only their column headers).

All the tables in the PDF have the same columns (Date, Balance, etc.), where I just want to stack them vertically and sort them by date.

Here's how I'm trying to extract and concatenating the tables from the PDF as efficiently as possible:

df = pd.concat(tabula.read_pdf(filename, pages='all', multiple_tables=True), axis=0)tabula.convert_into(filename, "Converted Document.csv", output_format="csv", pages='all')

The values for the other tables in the bank statement are being ignored even with

multiple_tables=True

I've looked at other solutions here, but I don't understand what the issue is.


Viewing all articles
Browse latest Browse all 14069

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>