Why do I get different embeddings when I perform batch encoding in huggingface MT5 model?

I am trying to encode some text using HuggingFace's mt5-base model. I am using the model as shown below

from transformers import MT5EncoderModel, AutoTokenizermodel = MT5EncoderModel.from_pretrained("google/mt5-base")tokenizer = AutoTokenizer.from_pretrained("google/mt5-base")def get_t5_embeddings(texts):    last_hidden_state = model(input_ids=tokenizer(texts, return_tensors="pt", padding=True).input_ids).last_hidden_state    pooled_sentence = torch.max(last_hidden_state, dim=1)    return pooled_sentence[0].detach().numpy()

I was doing some experiments when I noticed that the same text had a low cosine similarity score with itself. I did some digging and realized that the model was returning very different embeddings if I did the encoding in batches. To validate this, I ran a small experiment that generated embeddings for Hello and a list of 10 Hellos incrementally. and checking the embeddings of the Hello and the first Hello in the list (both of which should be same).

for i in range(1, 10):    print(i, (get_t5_embeddings(["Hello"])[0] == get_t5_embeddings(["Hello"]*i)[0]).sum())

This will return the number of values in the embeddings that match each other.This was the result:

1 7682 7683 7684 7685 7686 7687 7688 279 27

Every time I run it, I get mismatches if the batch size is more than 768.

Why am I getting different embeddings and how do I fix this?

Why do I get different embeddings when I perform batch encoding in huggingface MT5 model?

Trending Articles

RAMAYAMPET Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

Neem Baba Extra Questions Answer Class 6 English Poorvi

Throw Back: 4×4 — Sikilitele (Ft Castro) Prod by JQ

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Lowe faces four theft charges

Practice Sheet of Right form of verbs for HSC Students

Mafia, Murder & Mayhem In The Motor City: Detroit Mob Hit Timeline (1937-2007)

The 10 Tennessee Cities With The Largest Black Population For 2021

Materials Around Us Class 6 Worksheet Science Chapter 6

デスクトップヒープの枯渇

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Kanulanu Thaake Lyrics and translation | Manam (2014)

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

Teen Shot In Miami Drive-By Dies From Injuries

Download: IQ Muzatasha feat Shy D & Pmj – Ulesi NiFertilizer Yamavuto

Mahakal Attitude Status

Property developer set up cannabis factory to help pay off debts...

♡

KB: How to troubleshoot issues when adding a Hyper-V host in System Center...