Hi I have data frame in which I want to replace/cap the outlier with the 3*standard deviation value for all column with group by for each column. For example:
df = pd.DataFrame({"A":["A", "A", "A", "A", "B","B","B","B","B","B","B","B","B","B","B","B"], "B":[7, 2, 54, 3, 5,23,5,7,7,7,7,7,7,7,6,7], "C":[20, 16, 11, 3, 8,5,5,20,6,6,6,6,6,5,6,6], "D":[14, 3, 32, 2, 6,5,6,20,4,5,4,5,4,5,5,5], }) feature=['B','C','D']mean = df.groupby('A')[feature].mean()std = df.groupby('A')[feature].std()now I want to replace outlier for each column in feature with appropriate standard deviation for that group.
Something like below but for each group and each column
for col in feature: for each in df['A'].unique(): m=mean.loc[each,col] s=std.loc[each,col] df.loc[each,df[col]< m-3*s,]=m-3*sExpected output:
I have many column and loop is time consuming. Is there any better way or can it be done with one loop?
