Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23131

How can I manually group features with the SHAP package?

$
0
0

I would like to apply SHAP to calculate feature importance for an RNN model that predicts an output variable y [N_instances x 1] from a feature matrix [N_instances x N_times x N_features]. As a concrete example, imaging that each instance comes from collecting a time series of temperature and pressure from a chemical reactor each hour for 12 hours, and using it to predict the total mass of some chemical produced during the reactor run. In this case, N_features = 2 (temp and pressure). When I try to do this, I can only get shap values for temperature/pressure at individual times (i.e. I get 2 x 12 shap values). I'd like to get an overall shap value for temperature/pressure (i.e only 2 shap values).

As far as I can tell, SHAP requires a model function and a 2-D array of features. My RNN model is meant to take in 3-d data. So, I have to do some array reshaping. This is what I have so far:

#Importsfrom keras.models import Sequentialfrom keras.layers import Dense, LSTMimport shapimport numpy as np# Generate random 3-D X data and 1-D y data.N_TIMES = 10N_INSTANCES = 256N_FEATURES = 3X = np.random.random((N_INSTANCES, N_TIMES, N_FEATURES))y = np.random.random((N_INSTANCES, 1))#Write a simple modelmodel1 = Sequential()model1.add(LSTM(8, input_shape=(N_TIMES, N_FEATURES), return_sequences = False, stateful=False, activation='relu'))model1.add(Dense(1))# Compile and train the model on datamodel1.compile(loss='mean_squared_error',              optimizer='adam',              metrics=['MeanSquaredError'])history = model1.fit(X,                     y,                     batch_size=16,                     epochs=3,                     verbose = 0                    )# Define function to take in a 2D feature array and return model predictionsdef model_for_shap(X_flat):    X = X_flat.reshape((X_flat.shape[0], N_TIMES, N_FEATURES))    return model1.predict(X)X_flat = X.reshape((N_INSTANCES, N_TIMES*N_FEATURES))# Run SHAP background = X.reshape((X.shape[0], X.shape[1]*X.shape[2]))e = shap.Explainer(model_for_shap, background)shap_values = e.shap_values(X[:10].reshape((10, N_TIMES*N_FEATURES)))print(shap_values.shape)

The final print outputs:

Shape of shap_values: (10, 30)

This is sensible - I asked for the shap values for 10 datapoints, and there are N_TIMES x N_FEATURES = 30 total features.

What I would like to be able to get is shap values resulting from masking out the entirety of each features time series at a time. So, I'd get shap values for 3 features (N_FEATURES) instead of 30 (N_FEATURES x N_TIMES).

Is this possible? I thought the Partition masker might help, but I can't figure out how to manipulate the clustering argument to accomplish my goal.


Viewing all articles
Browse latest Browse all 23131

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>