Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 14360

How to make this regex pattern 'dynamic' for unknown repitition of instances

$
0
0

I have a line of text where I'm matching names:

193 ARMSTRONG Carol ARMSTRONG Ronald ARMSTRONG Dave

To separate name substrings like: ARMSTRONG Ronald or ARMSTRONG Dave I am using the regex:

pattern = '(?<=[a-z]\s)([A-Z]{2,}.+[a-z])(?=\s[A-Z])|(?<=[a-z]\s)([A-Z]{2,}.+$)'

If lines of text being read have more names like these. How do I make this regex pattern good for all cases?

Here's my demo/explanation

Additional Information

# Description: Function takes a list & attempts to match a pattern to check for multiple NAMES occuring on the same line# Registering this function splits these into separate lines, adding a roll entry number where none. Stores results in a list which is returned to calling scriptdef splitMultiUppercaseTokenLines(testList):    import traceback    try:        import re        # Pattern to isolate testList elements with multiple UPPERCASE tokens        pattern0 = r'^[0-9]{1,3}\s[A-Z]{2,}.+\s[A-Z]{2,}.+$'        # Pattern to match individual UPPERCASE token groups for separating        pattern1 = r'(?<=[a-z]\s)([A-Z]{2,}.+[a-z])(?=\s[A-Z])|(?<=[a-z]\s)([A-Z]{2,}.+$)'        # Pattern to match 'correct' first entry containing Roll Entry Number        pattern2 = r'^[0-9]+\s[A-Z]{2,}\s[a-zA-Z]+(?![A-Z])'        tempList = []        for i in range(len(testList)):            # Test for instance of multiple UPPERCASE tokens in a list string            results = re.findall(pattern0, testList[i])            if results:                # Store the first correct entry, without repeat NAMES first                firstEntry = re.findall(pattern2, testList[i])                tempList.append(firstEntry[0])                # Use next pattern to match separate UPPERCASE token groups. Append these name substrings which are without numbers                 entrys = re.findall(pattern1, testList[i])                # Unpack tuples created by multiple groups in pattern1                 for index, entry in entrys:                    if index:                        entry = index                    if entry:                        entry = entry                # Add ridiculous number '999' to flag missing value                entryString = entry                fullEntry = '999'+''+ entryString                tempList.append(fullEntry)        return tempList    except Exception as e:                        # to get detailed traceback                        print("Traceback from function splitMultiUppercaseTokenLines() #region 3 __main__.py generalModule")                        print(e)                        traceback.print_exc()# Test code belowtestList = ['091 ARMSTRONG Myrtle Alice', '092 ARMSTRONG Raymond George', '193 ARMSTRONG Rhonda Carol ARMSTRONG Ronald Melvin Phillip', '194 ARMSTRONG Timothy James ARMSTRONG Wesley']resultList = []resultList = splitMultiUppercaseTokenLines(testList)if resultList != None:    print("List of NAMES separated onto lines with accompanying roll entry number")    if len(resultList) > 0:        for j in range(len(resultList)):            print(resultList[j])else:    print("There were no matching patterns, UPPERCASE NAMES without roll entry number, among the list elements.")    enter code here

Output:

List of NAMES separated onto lines with accompanying roll entry number193 ARMSTRONG Rhonda999 ARMSTRONG Ronald Melvin194 ARMSTRONG Timothy999 ARMSTRONG Wesley

Viewing all articles
Browse latest Browse all 14360

Trending Articles