I have seen various questions on here about combining linked lists with recursion, but none of them seem to quite fit to my problem.
I have a list with elements that combine to make a single string. The elements of the string have 'attachment point' identifiers, of the format "[*:x]" which denote which element to attach where. For example, ["[*:2]DEF","ABC[*:1]GH"] will combine to ABCDEFGH. The process to combine two elements is replacing the attachment points with a character, for example "Z" and then calling a function which combines the two strings at the correct point, using the "Z" character as a locator (so I can't just regex it)
Easy enough when you have 2 elements, but when you have multiple elements that is much more difficult. If I want to combine the elements of the list lib = ["[*:4]F[*:2]","[*:1]GH[*:5]","AB[*:4]","[*:3]C[*:6]E[*:1]","[*:2]IJ","[*:4]D"] then I need a different approach. My predecessor wrote a function to recursively combine the elements as pure strings, which I have been using but now I need to add in the function to combine the elements as pairs.
The old recursive function is here:
lib = ["[*:4]F[*:2]","[*:1]GH[*:5]","AB[*:4]","[*:3]C[*:6]E[*:1]","[*:2]IJ","[*:4]D"]def combine(r, element, lib): if "[*:" in element: root = f"[*:{r}]" pos = element.index("[*:") pos1 = element.index("]", pos) next_r = int(element[pos + 3:pos1]) con = conf[next_r - 1] if root in con: con = con.replace(root, "") return element[0:pos] + combine(next_r, con, lib) + combine(r, element[pos1 + 1:], lib) return elementresult = combine(1,lib[0],lib)My semi-functional attempt to modify the above function to be able to use the pairwise combination function pairstitch is here:
import relib = ["[*:4]F[*:2]","[*:1]GH[*:5]","AB[*:4]","[*:3]C[*:6]E[*:1]","[*:2]IJ","[*:4]D"]#Example to make MWE work - actual function is much more complicateddef pairstitch(frag1, frag2, identifier): return frag1.replace(identifier, frag2.replace(identifier,""))def combine(r, smiles, lib): if "[*:" in element: root = f"[*:{r}]" pos = element.index("[*:") pos1 = element.index("]", pos) next_r = int(element[pos + 3:pos1]) con = lib[next_r - 1] attach_lst_base = re.findall("\[\*\:\d\]", element) attach_lst_con = re.findall("\[\*\:\d\]", con) if len(attach_lst_con) == 1: if root in con: next_r_block = f"[*:{next_r}]" frag1 = element.replace(next_r_block, "Z") frag2 = con.replace(root, "Z") else: pos = con.index("[*:") pos1 = con.index("]", pos) next_r = int(con[pos + 3:pos1]) next_r_block = con[pos:pos1 + 1] frag1 = smiles.replace(root, "Z") frag2 = con.replace(next_r_block, "Z") element = pairstitch(frag1,frag2,"Z") else: next_frag = con for att_pt in attach_lst_base: if len(attach_lst_base) > len(attach_lst_con): next_frag = combine(r, next_frag, conf) else: next_frag = combine(next_r, next_frag, lib) next_r_block = f"[*:{next_r}]" frag1 = smiles.replace(next_r_block, "Z") frag2 = next_frag.replace(root, "Z") element = pairstitch(frag1, frag2, "Z") return elementresult = combine(1,lib[0],lib)However this only goes down one branch of the tree, and gives me ABCDEF[*:2] when what I would expect is ABCDEFGHIJ.
For context, the strings being combined represent chemical fragments which might contain numbers, certain special characters like ":,/,,#,=,@", or sections like "[nH2]", hence the format of the attachment points. As such, splitting the elements up into smaller parts will not work because the function which combines pairs of fragments together does so by various chemical rules to make sure a valid molecule comes out the other end.