There are 5 columns in my dataframe that have a code value, generally that code value is the same in all those columns or zero. I want to find rows where there is more than 1 distinct value, excluding zero.
So in the example below, I would like rows id 1 and 4
df = pd.DataFrame({'id':[1,2,3,4],'col_A':[1,1,1,0],'col_B':[2,1,0,2],'col_C':[3,1,0,3],'col_D':[4,1,1,4],'col_E':[5,1,1,5]})
| id | col_A | col_B | col_C | col_D | col_E |
|---|---|---|---|---|---|
| 1 | 1 | 2 | 3 | 4 | 5 |
| 2 | 1 | 1 | 1 | 1 | 1 |
| 3 | 1 | 0 | 0 | 1 | 1 |
| 4 | 0 | 2 | 3 | 4 | 5 |
I am able to get the answer I want with the following code, but it is really ugly and I am certain there is a better way to do it.
df['cats'] = df.loc[:,df.filter(like='col_').columns].apply(pd.unique, axis=1)df[df['cats'].apply(lambda x:(len(x)-1) if 0 in x else len(x))>1]