I have to extract tons of pdf files/pages to Excel for work. I decided I can be done much more sooner with automating all or most of the process in python. Even though I know python, the most I could do was extract the information and place it on top of excel file (the code is pretty bad, so I am not going to include it)
Here is the files: https://drive.google.com/file/d/15Tdph6N4V_W3FIRgDj1cF0N0VEo4XFEp/viewImage of pdf:Image of pdf:Image of xlsx:Image of xlsx:
Essentially, what I want is to get values of all tables inside a pdf file & sheet I specified, then make it check with the .xlsx file & sheet I specified, where the values of pdf will be entered on the cells with matching row & column values. I also want to ignore any empty values (so in my case empty pdf values were read as NaN, I don't want them to be placed.
I am including an example pdf with 1 sheet and excel file with 1 sheet, if you can help me how to do what I want for those files, I can learn from that, and make any adaptations for future sheets.
EXPECTED RESULT: This is pdf / excel sheet before I ran my code / excel sheet after I ran my code of the code I could create for a very simple page, the code is bad so I am not including it to affect how other people will answer. Plus my code was not generalizable.