I have a line of text where I'm matching names:
193 ARMSTRONG Carol ARMSTRONG Ronald ARMSTRONG Dave
To separate name substrings like: ARMSTRONG Ronald
or ARMSTRONG Dave
I am using the regex:
pattern = '(?<=[a-z]\s)([A-Z]{2,}.+[a-z])(?=\s[A-Z])|(?<=[a-z]\s)([A-Z]{2,}.+$)'
If lines of text being read have more names like these. How do I make this regex pattern good for all cases?
Here's my demo/explanation
Additional Information
# Description: Function takes a list & attempts to match a pattern to check for multiple NAMES occuring on the same line# Registering this function splits these into separate lines, adding a roll entry number where none. Stores results in a list which is returned to calling scriptdef splitMultiUppercaseTokenLines(testList): import traceback try: import re # Pattern to isolate testList elements with multiple UPPERCASE tokens pattern0 = r'^[0-9]{1,3}\s[A-Z]{2,}.+\s[A-Z]{2,}.+$' # Pattern to match individual UPPERCASE token groups for separating pattern1 = r'(?<=[a-z]\s)([A-Z]{2,}.+[a-z])(?=\s[A-Z])|(?<=[a-z]\s)([A-Z]{2,}.+$)' # Pattern to match 'correct' first entry containing Roll Entry Number pattern2 = r'^[0-9]+\s[A-Z]{2,}\s[a-zA-Z]+(?![A-Z])' tempList = [] for i in range(len(testList)): # Test for instance of multiple UPPERCASE tokens in a list string results = re.findall(pattern0, testList[i]) if results: # Store the first correct entry, without repeat NAMES first firstEntry = re.findall(pattern2, testList[i]) tempList.append(firstEntry[0]) # Use next pattern to match separate UPPERCASE token groups. Append these name substrings which are without numbers entrys = re.findall(pattern1, testList[i]) # Unpack tuples created by multiple groups in pattern1 for index, entry in entrys: if index: entry = index if entry: entry = entry # Add ridiculous number '999' to flag missing value entryString = entry fullEntry = '999'+''+ entryString tempList.append(fullEntry) return tempList except Exception as e: # to get detailed traceback print("Traceback from function splitMultiUppercaseTokenLines() #region 3 __main__.py generalModule") print(e) traceback.print_exc()# Test code belowtestList = ['091 ARMSTRONG Myrtle Alice', '092 ARMSTRONG Raymond George', '193 ARMSTRONG Rhonda Carol ARMSTRONG Ronald Melvin Phillip', '194 ARMSTRONG Timothy James ARMSTRONG Wesley']resultList = []resultList = splitMultiUppercaseTokenLines(testList)if resultList != None: print("List of NAMES separated onto lines with accompanying roll entry number") if len(resultList) > 0: for j in range(len(resultList)): print(resultList[j])else: print("There were no matching patterns, UPPERCASE NAMES without roll entry number, among the list elements.") enter code here
Output:
List of NAMES separated onto lines with accompanying roll entry number193 ARMSTRONG Rhonda999 ARMSTRONG Ronald Melvin194 ARMSTRONG Timothy999 ARMSTRONG Wesley