I have a problem where I want to use pipeline (with OHE as preprocess and simple Linear Regression as model) with SHAP tools.
As for the data, here are my data (I'm using my modified version of bike sharing dataset):
bike_data=pd.read_csv("bike_outlier_clean.csv")bike_data['season']=bike_data.season.astype('category')bike_data['year']=bike_data.year.astype('category')bike_data['holiday']=bike_data.holiday.astype('category')bike_data['workingday']=bike_data.workingday.astype('category')bike_data['weather_condition']=bike_data.weather_condition.astype('category')bike_data['season'] = bike_data['season'].map({1:'Spring', 2:'Summer', 3:'Fall', 4: 'Winter'})bike_data['year'] = bike_data['year'].map({0: 2011, 1: 2012})bike_data['holiday'] = bike_data['holiday'].map({0: False, 1: True})bike_data['workingday'] = bike_data['workingday'].map({0: False, 1: True})bike_data['weather_condition'] = bike_data['weather_condition'].map({1:'Clear', 2:'Mist', 3:'Light Snow/Rain', 4: 'Heavy Snow/Rain'})bike_data = bike_data[['total_count','season','month','year','weekday','holiday','workingday','weather_condition','humidity','temp','windspeed']]x = bike_data.drop('total_count', axis=1)y = bike_data['total_count']x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=42)
and for my pipeline
category_columns = list(set(bike_data.columns) - set(bike_data._get_numeric_data().columns))preprocessor = ColumnTransformer( transformers=[ ('cat', OneHotEncoder(), category_columns) ], remainder='passthrough')model = LinearRegression()pipeline = Pipeline(steps=[('preprocessor', preprocessor), ('model', model)])pipeline.fit(x_train,y_train)
and finally, using the kernelSHAP explainer
explainer = shap.KernelExplainer(pipeline.predict, shap.sample(x, 5))
However, that is where the error occur.
123 # Make a copy so that the feature names are not removed from the original model 124 out = copy.deepcopy(out)--> 125 out.f.__self__.feature_names_in_ = None 126 127 return outAttributeError: can't set attribute 'feature_names_in_'
I'm quite clueless as for now what should I do to fix it.