Extracting table data from pdf with different no of col

import osimport refrom tika import parserfolder = ".../test/input"pattern = r'^(\d+)\s+(\S+)\s+([0-9.]+ x [0-9]+ x [A-Z]+)\s+([0-9.]+)\s+([0-9.]+)'data = []  # List to store extracted datafor filename in os.listdir(folder):    file_path = os.path.join(folder, filename)    parsed_pdf = parser.from_file(file_path)    if 'content' in parsed_pdf:        text = parsed_pdf['content']        matches = re.findall(pattern, text, re.MULTILINE)        for match in matches:            data.append(match)  # Append the extracted row data to the list    else:        print(f"Text extraction failed for file: {file_path}")# Print the extracted datafor row in data:    print(row)

I want to extract some data from a table in my pdf files, but some table has a extra column of data 'Quantity', how do I handle the conditions?

First type of data :

('57', '231228B23', '0.21 x 914 x C', '2.640', '2.680')('58', '231228B24', '0.21 x 914 x C', '2.682', '2.722')('59', '231228B25', '0.21 x 914 x C', '2.710', '2.750')('60', '231228B26', '0.21 x 914 x C', '2.714', '2.754')('61', '231228B27', '0.21 x 914 x C', '2.636', '2.676')('62', '231228B28', '0.21 x 914 x C', '2.628', '2.668')('63', '231228B29', '0.21 x 914 x C', '2.628', '2.668')('64', '231228A37', '0.21 x 914 x C', '2.684', '2.724')('65', '231228A38', '0.21 x 914 x C', '2.718', '2.758')('66', '231228A39', '0.21 x 914 x C', '2.646', '2.686')('67', '231228A40', '0.21 x 914 x C', '2.652', '2.692')

Second type of data :

('7', '231228B25', '0.21 x 914 x C', '1', '2.710', '2.750')('8', '231228B26', '0.21 x 914 x C', '1', '2.714', '2.754')('9', '231228B27', '0.21 x 914 x C', '1', '2.636', '2.676')('10', '231228B28', '0.21 x 914 x C', '1', '2.628', '2.668')('11', '231228B29', '0.21 x 914 x C', '1', '2.628', '2.668')('12', '231228A37', '0.21 x 914 x C', '1', '2.684', '2.724')

I do not need the Quantity column.

Extracting table data from pdf with different no of col

Trending Articles

RAMAYAMPET Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

Neem Baba Extra Questions Answer Class 6 English Poorvi

Throw Back: 4×4 — Sikilitele (Ft Castro) Prod by JQ

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Lowe faces four theft charges

Practice Sheet of Right form of verbs for HSC Students

Mafia, Murder & Mayhem In The Motor City: Detroit Mob Hit Timeline (1937-2007)

The 10 Tennessee Cities With The Largest Black Population For 2021

Materials Around Us Class 6 Worksheet Science Chapter 6

デスクトップヒープの枯渇

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Kanulanu Thaake Lyrics and translation | Manam (2014)

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Teen Shot In Miami Drive-By Dies From Injuries

Download: IQ Muzatasha feat Shy D & Pmj – Ulesi NiFertilizer Yamavuto

Mahakal Attitude Status

Property developer set up cannabis factory to help pay off debts...

♡

KB: How to troubleshoot issues when adding a Hyper-V host in System Center...