I've been trying to take a DataFrame like df
below, and turning some of the columns (say B_m
and B_n
) into two columns (call them B_m1
, B_m2
, B_n1
and B_n2
), for each pair of values in that column (akin to itertools.combinations(col, r=2)
), where they share the same group number. So for B_m
, the rows where Group
is 0, we should get that B_m1 = [-1, -1, 0]
and B_m2 = [0, 1, 1]
.
df = pd.DataFrame( data=[ [0, 4, 7, -1, 0.9], [0, 4, 7, 0, 0.3], [0, 4, 7, 1, 0.2], [1, 3, 3, 1, 0.5], [1, 3, 3, 0, 0.2], [2, 1, 8, 0, 0.6], ], columns=['Group', 'A_x', 'A_y', 'B_m', 'B_n'],)print(df)
In the case there is only 1 row with a given group, it should be removed. For 4 or more rows with the same group number, we should similarly find all repetitions without repeats.Shown below is what the expected result should look like.
expected = pd.DataFrame( data=[ [0, 4, 7, -1, 0, 0.9, 0.3], [0, 4, 7, -1, 1, 0.9, 0.2], [0, 4, 7, 0, 1, 0.3, 0.2], [1, 3, 3, 1, 0, 0.5, 0.2], ], columns=['Group', 'A_x', 'A_y', 'B_m1', 'B_m2', 'B_n1', 'B_n2'],)print(expected)
My first attempt not only took ages to run, using some awkard looping and making a new DataFrame (not elegant and don't have it saved), but also didn't work. Since then, I've not been able to come up with an alternative solution.
For the record, I did post a similar, but different question about 3 months ago, if that is of any inspiration.