Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23131

Need to Extract Particular Data from PDF file

$
0
0

`I am trying to extract the data from a pdf file however not able to.

Below is the code that I am using`

import fitz  # PyMuPDFdef extract_data_from_pdf(pdf_path):    # Open the PDF file    pdf_document = fitz.open(pdf_path)    # Create a dictionary to store extracted data    extracted_data = {}    # Define the fields you want to extract    fields_to_extract = ["Vehicle Registration No","Make","Model","Policy No.","Vehicle Make / Model","RTO City","CC/KW","Mfg Yr","Seating Capacity","Basic Third Party Liability","Total Tax Payable in","Total Premium Payable In","RTO Location",    ]    # Loop through each page of the PDF    for page_number in range(pdf_document.page_count):        page = pdf_document[page_number]        # Loop through each field to extract        for field in fields_to_extract:            # Search for the field in the page text            search_result = page.search_for(field +':')  # Added ':' to match the field name exactly            # If the field is found, extract the text            if search_result:                field_text = page.get_text("text", clip=search_result[0])                extracted_data[field] = field_text.replace(field +':', '').strip()    # Close the PDF document    pdf_document.close()    return extracted_data# Specify the path to your PDF filepdf_path = "path/to/your/pdf/file.pdf"# Extract data from the PDFdata = extract_data_from_pdf(pdf_path)# Print the extracted datafor key, value in data.items():    print(f"{key}: {value}")

#It is giving below result

Vehicle Registration No: Vehicle RegistrationMake: MakeModel: ModelPolicy No.: Policy No.Vehicle Make / Model: Vehicle Make / ModelRTO City: RTO CityCC/KW: CC/KWMfg Yr: Mfg YrSeating Capacity: SeatingBasic Third Party Liability: Basic Third Party LiabilityTotal Tax Payable in: Total Tax Payable inTotal Premium Payable In: Total Premium Payable InRTO Location: RTO Location

#Instead of

Vehicle Registration No: PB03AW9668Make: HONDA MOTORCYCLEModel: ACTIVA 5G STDPolicy No.: 3005/A/326653656/00/B00Vehicle Make / Model: HONDA MOTORCYCLE / ACTIVA 5G STDRTO City: RTO CityCC/KW: 109Mfg Yr: 2018Seating Capacity: 2Basic Third Party Liability: 714.00Total Tax Payable in: 129.00Total Premium Payable In: 843.00RTO Location: PUNJAB-BATHINDA


Viewing all articles
Browse latest Browse all 23131

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>