Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23131

How to extract matching information from pdf to excel on appropriate location using python?

$
0
0

I have to extract tons of pdf files/pages to Excel for work. I decided I can be done much more sooner with automating all or most of the process in python. Even though I know python, the most I could do was extract the information and place it on top of excel file (the code is pretty bad, so I am not going to include it)

Here is the files: https://drive.google.com/file/d/15Tdph6N4V_W3FIRgDj1cF0N0VEo4XFEp/viewImage of pdf:Image of pdf:Image of xlsx:Image of xlsx:

Essentially, what I want is to get values of all tables inside a pdf file & sheet I specified, then make it check with the .xlsx file & sheet I specified, where the values of pdf will be entered on the cells with matching row & column values. I also want to ignore any empty values (so in my case empty pdf values were read as NaN, I don't want them to be placed.

I am including an example pdf with 1 sheet and excel file with 1 sheet, if you can help me how to do what I want for those files, I can learn from that, and make any adaptations for future sheets.

EXPECTED RESULT: This is pdf / excel sheet before I ran my code / excel sheet after I ran my code of the code I could create for a very simple page, the code is bad so I am not including it to affect how other people will answer. Plus my code was not generalizable.

enter image description here

enter image description here

enter image description here


Viewing all articles
Browse latest Browse all 23131

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>