`I am trying to extract the data from a pdf file however not able to.
Below is the code that I am using`
import fitz # PyMuPDFdef extract_data_from_pdf(pdf_path): # Open the PDF file pdf_document = fitz.open(pdf_path) # Create a dictionary to store extracted data extracted_data = {} # Define the fields you want to extract fields_to_extract = ["Vehicle Registration No","Make","Model","Policy No.","Vehicle Make / Model","RTO City","CC/KW","Mfg Yr","Seating Capacity","Basic Third Party Liability","Total Tax Payable in","Total Premium Payable In","RTO Location", ] # Loop through each page of the PDF for page_number in range(pdf_document.page_count): page = pdf_document[page_number] # Loop through each field to extract for field in fields_to_extract: # Search for the field in the page text search_result = page.search_for(field +':') # Added ':' to match the field name exactly # If the field is found, extract the text if search_result: field_text = page.get_text("text", clip=search_result[0]) extracted_data[field] = field_text.replace(field +':', '').strip() # Close the PDF document pdf_document.close() return extracted_data# Specify the path to your PDF filepdf_path = "path/to/your/pdf/file.pdf"# Extract data from the PDFdata = extract_data_from_pdf(pdf_path)# Print the extracted datafor key, value in data.items(): print(f"{key}: {value}")#It is giving below result
Vehicle Registration No: Vehicle RegistrationMake: MakeModel: ModelPolicy No.: Policy No.Vehicle Make / Model: Vehicle Make / ModelRTO City: RTO CityCC/KW: CC/KWMfg Yr: Mfg YrSeating Capacity: SeatingBasic Third Party Liability: Basic Third Party LiabilityTotal Tax Payable in: Total Tax Payable inTotal Premium Payable In: Total Premium Payable InRTO Location: RTO Location
#Instead of
Vehicle Registration No: PB03AW9668Make: HONDA MOTORCYCLEModel: ACTIVA 5G STDPolicy No.: 3005/A/326653656/00/B00Vehicle Make / Model: HONDA MOTORCYCLE / ACTIVA 5G STDRTO City: RTO CityCC/KW: 109Mfg Yr: 2018Seating Capacity: 2Basic Third Party Liability: 714.00Total Tax Payable in: 129.00Total Premium Payable In: 843.00RTO Location: PUNJAB-BATHINDA