I would like to use isolation forest for data whose class is normal 99.93% anormaly 0.07% and check the correlation between features for anormaly data using SHAP.
So, I proceeded with learning by referring to the method on the following kaggle site.kaggle
On this Kaggle site, data with Class = 0 and data with Class = 1 are divided as follows:
inliers = df[df.Class==0]ins = inliers.drop(['Class'], axis=1)outliers = df[df.Class==1]outs = outliers.drop(['Class'], axis=1)
To see the correlation between the features used in learning and outliers (data with 'Class == 1'), I used SHAP as follows and checked the correlation through beeswarm plot.
state= 42ISF = IsolationForest(random_state=state)ISF.fit(ins)normal_isf = ISF.predict(ins)fraud_isf = ISF.predict(outs)import shapexplainer = shap.TreeExplainer(ISF)shap_values = explainer(outs)shap.plots.beeswarm(shap_values)
The code works fine, but the results from beeswarn are similar to when I used shap_values = explainer(ins), i.e. normal data. Am I making a mistake? I would be very grateful if you could let me know if any areas need improvement.