My goal is the following:I want to train an Object Detectino model, which can classify multiple classes within an image.Each of the classes can appear a varying number of times. It is also possible for different classes to appear within one image.
This is the model i am currently using:
def create_model(inShape, num_classes, optimizer): input_layer = layers.Input(shape=inShape) x = layers.Conv2D(32, (3, 3), activation='relu')(input_layer) x = layers.BatchNormalization()(x) x = layers.MaxPooling2D((2, 2))(x) x = layers.Conv2D(64, (3, 3), activation='relu')(x) x = layers.BatchNormalization()(x) x = layers.MaxPooling2D((2, 2))(x) x = layers.Conv2D(128, (3, 3), activation='relu')(x) x = layers.BatchNormalization()(x) x = layers.GlobalAveragePooling2D()(x) x = layers.Dense(128, activation='relu')(x) x = layers.BatchNormalization()(x) x = layers.Flatten()(x) class_probs_layer = layers.Dense(4, activation='softmax', name='class_Probs')(x) bbox_layer = layers.Dense(4, activation='linear', name='bboxes')(x) model = Model(inputs = input_layer, outputs=[class_probs_layer, bbox_layer]) #Model kompilieren model.compile(optimizer=optimizer, loss={'class_Probs': 'categorical_crossentropy','bboxes': 'mean_squared_error'}, metrics={'class_Probs': 'accuracy','bboxes': 'mean_squared_error'}) return model
The output of the model is one bounding box (4 coordinates) and 2 probabilities, since right now i am training on a dataset with only 2 classes (caries, black stain).P.S. the class_probs_layer has 4 outputs only for an error fix - otherwise logits and labels don't match
Please suggest different model structure of some kind.
Also, my training is based on TFRecord-Files, which i write myself. I have one question regarding those:Since there is multiple instances within in each image (but not the same number every time), there is 2 possibilities to write the TFRecords:
image Data, annotation, image Data, annotation...
-> one instance for each bounding box, so multiple instances per image
OR
image Data, annotation, annotation, annotation, image Data...
-> one instance per image, where the image data, which constists of image_id, filename and image_encoded, only appears once, and is followed by all the annotations (bounding boxes, category_ids) which belong to one image (image_id).