I am new to using TensorFlow, but I am trying to build an LSTM Model using tokenized strings as inputs.
# Define the modeldef build_lstm_model(): input_ids = tf.keras.layers.Input(shape=(128,), dtype=tf.int32, name='input_ids') # BERT embedding layer bert_model = TFBertModel.from_pretrained('bert-base-uncased') bert_output = bert_model(input_ids)[1] # using the pooled output # LSTM layer lstm_output = tf.keras.layers.LSTM(64)(tf.expand_dims(bert_output, axis=1)) # Expand the dimensions for LSTM # Output layer output = tf.keras.layers.Dense(1, activation='softmax')(lstm_output) model = tf.keras.Model(inputs=input_ids, outputs=output) model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']) return model# Train the modelmodel = build_lstm_model()#model.summary()model.fit(train_dataset, validation_data=val_dataset, epochs=3)Trying to fit this model returns the error:
logits and labels must have the same first dimension, got logits shape [8,1] and labels shape [1024]
More notes:'train_dataset' and 'val_dataset' are both TensorFlow datasets with shape=(None,128).
Here is the output of model.summary():
Model: "model"_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_ids (InputLayer) [(None, 128)] 0 tf_bert_model (TFBertModel TFBaseModelOutputWithPo 109482240 ) olingAndCrossAttentions (last_hidden_state=(Non e, 128, 768), pooler_output=(None, 7 68), past_key_values=None, hidden_states=None, att entions=None, cross_att entions=None) tf.expand_dims (TFOpLambda (None, 1, 768) 0 ) lstm (LSTM) (None, 64) 213248 dense (Dense) (None, 1) 65 =================================================================Total params: 109695553 (418.46 MB)Trainable params: 109695553 (418.46 MB)Non-trainable params: 0 (0.00 Byte)_________________________________________________________________I can't figure out how to get the dimensions to match so I can start training my model.
I tried the following solutions after reading other StackOverflow posts:
- Flattening the LSTM layer
lstm_output = tf.keras.layers.Flatten()(lstm_output)- Changing the output shape of the LSTM layer and Dense layer.
- Changing the loss function to 'categorical_crossentropy'.
- Changing the output layer activation to 'sigmoid'.