Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23131

How to replace the outlier with 3 standard deviation value for all column based on group by column?

$
0
0

Hi I have data frame in which I want to replace/cap the outlier with the 3*standard deviation value for all column with group by for each column. For example:

df = pd.DataFrame({"A":["A", "A", "A", "A", "B","B","B","B","B","B","B","B","B","B","B","B"], "B":[7, 2, 54, 3, 5,23,5,7,7,7,7,7,7,7,6,7], "C":[20, 16, 11, 3, 8,5,5,20,6,6,6,6,6,5,6,6], "D":[14, 3, 32, 2, 6,5,6,20,4,5,4,5,4,5,5,5],                }) feature=['B','C','D']mean = df.groupby('A')[feature].mean()std = df.groupby('A')[feature].std()

now I want to replace outlier for each column in feature with appropriate standard deviation for that group.

Something like below but for each group and each column

for col in feature: for each in df['A'].unique():  m=mean.loc[each,col]  s=std.loc[each,col]  df.loc[each,df[col]< m-3*s,]=m-3*s

Expected output:

enter image description here

I have many column and loop is time consuming. Is there any better way or can it be done with one loop?


Viewing all articles
Browse latest Browse all 23131

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>