Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 13921

Parsing of cobol code using regex and ET, not working giving incorrect output

$
0
0

I am parsing cobol code, example is below. tried with lark, but failed because of grammer error. so using regex with elementTree, is giving incorrect output. I want to make it as generic as possible. I am going to use the xml parsed output as input to Starcoder base model,to generate Java code from same logic.:

IDENTIFICATION DIVISION.PROGRAM-ID. PAYROLL-PROCESSING.DATA DIVISION.  WORKING-STORAGE SECTION.  01 EMPLOYEE-RECORD.    02 EMPLOYEE-ID PIC 9(5).    02 EMPLOYEE-NAME PIC X(30).    02 HOURS-WORKED PIC 9(3).    02 HOURLY-RATE  PIC 9(5)V99.    02 GROSS-SALARY PIC 9(7)V99.    02 TAX-RATE PIC 9(3).    02 NET-SALARY PIC 9(7)V99.    02 BASIC-SALARY PIC 9(7)V99.    02 HOLIDAYS PIC 9(5).    02 HRA PIC 9(5)V99.    02 MEDICAL-ALLOWANCE PIC 9(5)V99.    02 TRANSPORT-ALLOWANCE PIC 9(5)  VALUE "1500".    02 LTA PIC 9(5)V99.    02 FIXED-BONUS PIC 9(5)  VALUE "15000".    02 PERFORMANCE-BONUS PIC 9(5)V99.    02 PROVIDENT-FUND PIC 9(5)V99.    02 PROF-TAX PIC 9(5)  VALUE "200".    02 LWF-CONTRI PIC 9(5)  VALUE "0".    02 INCOME-TAX PIC 9(5)V99.    02 TRUST-CONTRI PIC 9(5)  VALUE "500".PROCEDURE DIVISION.  DISPLAY-HEADER.  ACCEPT-EMPLOYEE-DATA.  CALCULATE-BASIC-SALARY.  CALCULATE-HRA.  CALCULATE-MEDICAL-ALLOWANCE.  CALCULATE-LTA.  CALCULATE-PERFORMANCE-BONUS.  CALCULATE-GROSS-SALARY.  CALCULATE-PROVIDENT-FUND.  CALCULATE-INCOME-TAX.  DISPLAY-SALARY.  STOP-RUN.DISPLAY-HEADER.  DISPLAY "PAYROLL PROCESSING SYSTEM".  DISPLAY "-------------------------".ACCEPT-EMPLOYEE-DATA.  DISPLAY "ENTER EMPLOYEE ID: ".  ACCEPT EMPLOYEE-ID.  DISPLAY "ENTER EMPLOYEE NAME: ".  ACCEPT EMPLOYEE-NAME.  DISPLAY "ENTER HOURS WORKED: ".  ACCEPT HOURS-WORKED.  DISPLAY "ENTER HOURLY RATE: ".  ACCEPT  HOURLY-RATE.  DISPLAY "ENTER HOLIDAYS TAKEN: ".  ACCEPT HOLIDAYS.CALCULATE-BASIC-SALARY.  COMPUTE BASIC-SALARY = HOURS-WORKED*HOURLY-RATE.CALCULATE-HRA.  COMPUTE HRA = (BASIC-SALARY *10 /100).CALCULATE-MEDICAL-ALLOWANCE.  COMPUTE MEDICAL-ALLOWANCE = (BASIC-SALARY *5 /100).CALCULATE-LTA.  COMPUTE LTA = (BASIC-SALARY *12 /100).CALCULATE-PERFORMANCE-BONUS.  IF HOURLY-RATE < 31    COMPUTE PERFORMANCE-BONUS = 50000  ELSE    COMPUTE PERFORMANCE-BONUS = 25000  END-IF.CALCULATE-GROSS-SALARY.  COMPUTE GROSS-SALARY = BASIC-SALARY + HRA + MEDICAL-ALLOWANCE + LTA + PERFORMANCE-BONUS.CALCULATE-PROVIDENT-FUND.  COMPUTE PROVIDENT-FUND = (BASIC-SALARY *12 /100).CALCULATE-INCOME-TAX.  IF GROSS-SALARY > 0 AND GROSS-SALARY < 50000    COMPUTE INCOME-TAX = GROSS-SALARY - (GROSS-SALARY * 10/100).DISPLAY-SALARY.  DISPLAY "EMPLOYEE ID: " EMPLOYEE-ID.  DISPLAY "EMPLOYEE NAME: " EMPLOYEE-NAME.  DISPLAY "BASIC SALARY: " BASIC-SALARY.  DISPLAY "HRA :" HRA.  DISPLAY "MEDICAL ALLOWANCE :" MEDICAL-ALLOWANCE.  DISPLAY "TRANSPORT ALLOWANCE :" TRANSPORT-ALLOWANCE.  DISPLAY "LTA :" LTA.  DISPLAY "FIXED BONUS :" FIXED-BONUS.  DISPLAY "PERFORMANCE BONUS :" PERFORMANCE-BONUS.  DISPLAY "PROVIDENT FUND :" PROVIDENT-FUND.  DISPLAY "PROFESSIONAL TAX :" PROF-TAX.  DISPLAY "LWF CONTRIBUTION :" LWF-CONTRI.  DISPLAY "INCOME TAX :" INCOME-TAX.  DISPLAY "WELFARE TRUST CONTRI :" TRUST-CONTRI.

I am using following python code to parse it:

def parse_cobol_code(cobol_code):    root = ET.Element("COBOL_Code")    current_division = None    current_section = None    current_paragraph = None    for line in cobol_code.split('\n'):        line = line.strip()        division_match = re.match(r'^\s+\d{4}-\d{4}\s{2,}([A-Z ]+\.?)$', line)        section_match = re.match(r'^\s{7}([A-Z ]+\.?)$', line)        if division_match:            current_division = ET.SubElement(root, "Division", name=division_match.group(1))            current_section = None            current_paragraph = None        elif section_match and current_division is not None:            current_section = ET.SubElement(current_division, "Section", name=section_match.group(1))            current_paragraph = None        elif current_section is not None and line:            current_paragraph = ET.SubElement(current_section, "Paragraph")            current_paragraph.text = line    xml_content = ET.tostring(root, encoding='unicode')    return xml_content

But always getting xml output as: <COBOL_Code />nothing else is updating, new to python, please help.


Viewing all articles
Browse latest Browse all 13921

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>