I'm using a Data Generator that feeds a different part of the training set in each epoch. Each part has a different number of observations, so the steps per epoch changes each time on_epoch_end is called. The value stored for steps_per_epoch stays the same for all epochs (the value calculated for the first epoch) but the actual number of steps is different from this value:
Epoch 1/35/5 [==============================] - 18s 4s/step - loss: 265.6053 - auc: 0.6452 - val_loss: 0.6685 - val_auc: 0.8281Epoch 2/35/5 [==============================] - 32s 7s/step - loss: 0.7178 - auc: 0.7595 - val_loss: 0.6427 - val_auc: 0.8443Epoch 3/35/5 [==============================] - 15s 3s/step - loss: 0.6770 - auc: 0.6119 - val_loss: 0.6347 - val_auc: 0.8369
The 5/5
you see in this output is the end result, but while the fit method is running, those numbers sometimes reach a higher value than 5 and others a lower number: 8/5
or 4/5
.
Is this a problem I should worry about? Or I can disregard the reported value for steps_per_epoch and assume the fit method is doing what it should?
This is my data generator, where the data is loaded from 4 different numpy files, a different file at each epoch:
class DataGenerator(keras.utils.Sequence):'Generates data for Keras' def __init__(self, path_to_labels, path_to_data, batch_size=32, n_classes=2, shuffle=True):''' Initialization''' self.n_channels = 5 # path_to_data is a list of 4 paths self.path_to_data = path_to_data self.j = -1 self.labels = {} self.labels[0] = np.load(path_to_labels[0]) self.labels[1] = np.load(path_to_labels[1]) self.labels[2] = np.load(path_to_labels[2]) self.labels[3] = np.load(path_to_labels[3]) self.batch_size = batch_size self.n_classes = n_classes self.shuffle = shuffle self.on_epoch_end() def __len__(self):'Denotes the number of batches per epoch' return int(np.ceil(self.len / self.batch_size)) def __getitem__(self, index):'Generate one batch of data' # Generate indexes of the batch indexes = self.indexes[index*self.batch_size:(index+1)*self.batch_size] # Generate data X, y = self.__data_generation(indexes) return X, y def get_dim(self):'Dimensions for the input layer.' return (self.dim[0], self.dim[1], self.n_channels) def on_epoch_end(self):'Updates indexes after each epoch' self.j = self.j + 1 if self.j == 4: self.j = 0 self.data = np.load(self.path_to_data[self.j]) self.len = self.data.shape[0] self.dim = (self.data.shape[1], self.data.shape[2]) self.indexes = np.arange(self.len) if self.shuffle == True: np.random.shuffle(self.indexes) def __data_generation(self, indexes):'Generates data containing batch_size samples' # X : (n_samples, *dim, n_channels) # Initialization true_size = len(indexes) X = np.empty((true_size, *self.dim, self.n_channels)) y = np.empty((true_size), dtype=float) # Generate data for i, idx in enumerate(indexes): X[i,:,:,:] = self.data[idx, :, :, :] # Store solution y[i] = self.label[j][idx] return X, keras.utils.to_categorical(y, num_classes=self.n_classes)
I couldn't find a way to test if this is working as expected yet. I fit the model and everything works fine, except for the counter of the current step in each epoch, which could end in a lower or higher value than the steps_per_epoch value stored by keras.