I have been fine tuning the Keras implementation of GPT-2 for question answering. Overall the results seem promising, but I am seeing issues if I include very similar questions. The behavior I am seeing is that the model gets confused and replies with the same answer to both questions.
tokenizer = keras_nlp.models.GPT2Tokenizer.from_preset("gpt2_base_en") preprocessor = keras_nlp.models.GPT2CausalLMPreprocessor( tokenizer=tokenizer, sequence_length=128, ) gpt2_lm = keras_nlp.models.GPT2CausalLM.from_preset("gpt2_base_en", preprocessor=preprocessor ) questionsWithAnswers = [] questionsWithAnswers.append("What is Joe's phone number? Joe's phone number is 555-555-5555") questionsWithAnswers.append("What is Mary's phone number? Mary's phone number is 444-444-4444") loss = keras.losses.SparseCategoricalCrossentropy(from_logits=True) gpt2_lm.compile( optimizer='Adam', loss=loss, weighted_metrics=["accuracy"], ) gpt2_lm.fit(epochs=50, verbose=2, batch_size=2, x=questionsWithAnswers) gpt2_lm.save_weights("llm.weights.h5")
In the sample above I can ask: "What is Joe's phone number?" and "What is Mary's phone number?". Unfortunately the model seems to just pick one of the answers and generate the same reply for both questions.
I assume the issue is the similarity between the questions where the only difference is the name of the person.
Any ideas how to teach the model to differentiate between the two questions/answers?