This is an extension to this post.
My dataframe is:
import pandas as pddf = pd.DataFrame( {'a': [100, 1123, 123, 100, 1, 0, 1],'b': [1000, 11123, 1123, 0, 55, 0, 1],'c': [100, 1123, 123, 999, 11, 50, 1],'d': [100, 1123, 123, 190, 1, 105, 1],'e': ['a', 'b', 'c', 'd', 'e', 'f', 'g'], })
And this is the output that I want. I need to create column x
:
a b c d e x0 100 1000 100 100 a NaN1 1123 11123 1123 1123 b NaN2 123 1123 123 123 c NaN3 100 0 999 190 d NaN4 1 55 11 1 e NaN5 0 0 50 105 f f6 1 1 1 1 g NaN
My mask is:
mask = (df.a > df.b)
And these are the steps needed:
a) Find the first row that meets conditions of the mask.
b) Get the value of column a
of the above step.
c) Find the first row that the above value is between columns c
and d
. Being equal to one of them is also OK.
d) Get the value in column e
and create column x
.
For example for the above dataframe:
a) First row of mask is row 3
.
b) The value of column a
is 100.
c) From rows that are after the mask (4, 5, ...) the first row that 100 is between columns c
and d
is row 5. So 'f' is selected for column x
.
d) So 'f' is chosen for column x
.
This image clarifies the above steps:
This is what I have tried:
mask = (df.a > df.b)val = df.loc[mask.cumsum().eq(1) & mask, 'a']
I prefer the solution to be generic like this answer.
I have provided some additional dataframes in case you need to test the code with other subtle different conditions. For instance what if there no rows that meets conditions of the mask. In that case column x
is all NaN
s. Column names are all the same as the above df
.
df = pd.DataFrame({'a': [100, 1123, 123, -1, 1, 0, 1], 'b': [1000, 11123, 1123, 0, 55, 0, 1],'c': [100, 1123, 123, 999, 11, 50, 1], 'd': [100, 1123, 123, 190, 1, 105, 1], 'e': ['a', 'b', 'c', 'd', 'e', 'f', 'g']})df = pd.DataFrame({'a': [100, 1123, 123, 100, 1, 0, 1], 'b': [1000, 11123, 1123, 0, 55, 0, 1], 'c': [100, 1123, 123, 999, 11, -1, 1], 'd': [100, 1123, 123, 190, 1, 10, 1], 'e': ['a', 'b', 'c', 'd', 'e', 'f', 'g']})df = pd.DataFrame({'a': [100, 1123, 123, 1, 1, 0, 100], 'b': [1000, 11123, 1123, 0, 55, 0, 1], 'c': [100, 1123, 123, 999, 11, -1, 50], 'd': [100, 1123, 123, 190, 1, 10, 101], 'e': ['a', 'b', 'c', 'd', 'e', 'f', 'g']})df = pd.DataFrame({'a': [100, 1123, 123, 100, 1, 1000, 1],'b': [1000, 11123, 1123, 0, 55, 0, 1],'c': [100, 1123, 123, 999, 11, 50, 500], 'd': [100, 1123, 123, 190, 1, 105, 2000], 'e': ['a', 'b', 'c', 'd', 'e', 'f', 'g']})