Quantcast
Viewing all articles
Browse latest Browse all 14185

SHAP KernelExplainer using PipeLine

I have a problem where I want to use pipeline (with OHE as preprocess and simple Linear Regression as model) with SHAP tools.

As for the data, here are my data (I'm using my modified version of bike sharing dataset):

bike_data=pd.read_csv("bike_outlier_clean.csv")bike_data['season']=bike_data.season.astype('category')bike_data['year']=bike_data.year.astype('category')bike_data['holiday']=bike_data.holiday.astype('category')bike_data['workingday']=bike_data.workingday.astype('category')bike_data['weather_condition']=bike_data.weather_condition.astype('category')bike_data['season'] = bike_data['season'].map({1:'Spring', 2:'Summer', 3:'Fall', 4: 'Winter'})bike_data['year'] = bike_data['year'].map({0: 2011, 1: 2012})bike_data['holiday'] = bike_data['holiday'].map({0: False, 1: True})bike_data['workingday'] = bike_data['workingday'].map({0: False, 1: True})bike_data['weather_condition'] = bike_data['weather_condition'].map({1:'Clear', 2:'Mist', 3:'Light Snow/Rain', 4: 'Heavy Snow/Rain'})bike_data = bike_data[['total_count','season','month','year','weekday','holiday','workingday','weather_condition','humidity','temp','windspeed']]x = bike_data.drop('total_count', axis=1)y = bike_data['total_count']x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=42)

and for my pipeline

category_columns = list(set(bike_data.columns) - set(bike_data._get_numeric_data().columns))preprocessor = ColumnTransformer(    transformers=[        ('cat', OneHotEncoder(), category_columns)    ],    remainder='passthrough')model = LinearRegression()pipeline = Pipeline(steps=[('preprocessor', preprocessor), ('model', model)])pipeline.fit(x_train,y_train)

and finally, using the kernelSHAP explainer

explainer = shap.KernelExplainer(pipeline.predict, shap.sample(x, 5))

However, that is where the error occur.

    123             # Make a copy so that the feature names are not removed from the original model    124             out = copy.deepcopy(out)--> 125             out.f.__self__.feature_names_in_ = None    126     127     return outAttributeError: can't set attribute 'feature_names_in_'

I'm quite clueless as for now what should I do to fix it.


Viewing all articles
Browse latest Browse all 14185

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>