I have a dataframe with the following columns: Parent, Parent_Rev, Child, ChildRev. In this structure, Parent serves as a parent node, while Child functions as a child node. Both Parent_Rev and ChildRev capture different revisions of their respective nodes. A Child may appear in the parent column and have it's own child values, indicated in columns Child and Child_Rev.
This hierarchical chain continues until a path is established where a Parent doesn't appear as a Child to any other values.
To generate the desired output, it's necessary to traverse through all the values, identifying all possible levels and combinations until the top node is reached. When checking for possible links, the combination of Node+Rev should be considered unique, rather than just the Node value alone.
- Sample Input Dataframe:
Parent | Parent_Rev | Child | Child_REV |
---|---|---|---|
A1 | 1 | B1 | 1 |
B1 | 1 | C1 | 1 |
C1 | 1 | D1 | 1 |
D1 | 1 | E1 | 1 |
A2 | 1 | B2 | 1 |
B2 | 1 | C2 | 1 |
C2 | 1 | C2 | 1 |
A3 | 1 | B3 | 3 |
A4 | 1 | H4 | 3 |
import pandas as pd df1 = pd.DataFrame({'Node1': ['A1','B1','C1','D1','A2','B2','C2','A3','A4'],'Node1_Rev': ['1','1','1','1','1','1','1','1','1'],'Node2': ['B1','C1','D1','E1','B2','C2','C2','B3','H4'],'Node2_Rev': ['1','1', '1','1','1','1','1','3','3'] } )- **Sample Output Dataframe:**|Root|Root_Rev| Parent | Parent_Rev | Child | Child_REV || --- | --- | ---| --- | --- |--- || A1 | 1 || A1 | 1 | B1 | 1 | | A1 | 1 || B1 | 1 | C1 | 1 || A1 | 1 || C1 | 1 | D1 | 1 || A1 | 1 || D1 | 1 | E1 | 1 || A2 | 1 || A2 | 1 | B2 | 1 || A2 | 1 || B2 | 1 | C2 | 1 || A2 | 1 || C2 | 1 | C2 | 1 || A3 | 1 || A3 | 1 | B3 | 3 || A4 | 1 || A4 | 1 | H4 | 3 |What are the efficient ways to generate the output for a bigger dataset to update the root and rev for all the parent_child input combinations?