TF Transformer model never overfits and just plateaus: Interpretation of this training curve and suggestions for improvement

This training curve is for a Transformer model that processes 2D (excluding batch) sequential signal and uses Adam optimizer, 32 batch size and for the learning rate: a custom LR Scheduler that replicates the warmup scheduler that is used at 'Attention is All You Need' paper. Training curve as below plateaus with eventual Training loss slightly lower than Validation loss, but training loss never starts back to climb, which I interpreted as the model never starts overfitting and just stops re-adjusting weights after around epoch 90.

Better interpretation and solutions to improve this model?

Below is my brief reproducible code:

x_train = np.random.normal(size=(32, 512, 512))batch_size = 32H, W = x_train.shaperows, cols = np.indices((H, W), sparse=True)padding_mask_init = np.zeros((H, W, W), dtype=np.bool_)padding_mask_init[rows, 1:, cols] = 1padding_mask = padding_mask_init[:batch_size]embed_dim = 512dense_dim = 2048num_heads = 2shape = (batch_size, embed_dim, 512) #(32, 512, 512)decoder_inputs = layers.Input(batch_input_shape=shape, dtype=tensorflow.float16)mha_1 = layers.MultiHeadAttention(num_heads=num_heads, key_dim=embed_dim)mha_2 = layers.MultiHeadAttention(num_heads=num_heads, key_dim=embed_dim)layernorm_1 = layers.LayerNormalization()Z = decoder_inputsZ = mha_1(query=Z, value=Z, key=Z, use_causal_mask=True, attention_mask=padding_mask)Z = layernorm_1(Z + decoder_inputs)Z = mha_2(query=Z, value=decoder_inputs, key=decoder_inputs, attention_mask=padding_mask)outputs = layers.TimeDistributed(keras.layers.Dense(embed_dim, activation="softmax"))(Z)model = keras.Model(decoder_inputs, outputs)model.compile(loss="mean_squared_error", optimizer=tf.keras.optimizers.Adam(learning_rate=lr_schedule(embed_dim, 3000),beta_1=0.9,beta_2=0.98,epsilon=1.0e-9), metrics=["accuracy"])history = model.fit(dataset, epochs=200, validation_data=val_dataset)

TF Transformer model never overfits and just plateaus: Interpretation of this training curve and suggestions for improvement

Trending Articles

RAMAYAMPET Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

Neem Baba Extra Questions Answer Class 6 English Poorvi

Throw Back: 4×4 — Sikilitele (Ft Castro) Prod by JQ

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Lowe faces four theft charges

Practice Sheet of Right form of verbs for HSC Students

Mafia, Murder & Mayhem In The Motor City: Detroit Mob Hit Timeline (1937-2007)

The 10 Tennessee Cities With The Largest Black Population For 2021

Materials Around Us Class 6 Worksheet Science Chapter 6

デスクトップヒープの枯渇

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Kanulanu Thaake Lyrics and translation | Manam (2014)

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Teen Shot In Miami Drive-By Dies From Injuries

Download: IQ Muzatasha feat Shy D & Pmj – Ulesi NiFertilizer Yamavuto

Mahakal Attitude Status

Property developer set up cannabis factory to help pay off debts...

♡

KB: How to troubleshoot issues when adding a Hyper-V host in System Center...