Quantcast
Channel: Active questions tagged python - Stack Overflow
Viewing all articles
Browse latest Browse all 23160

Langchain gpt-3.5-turbo models reads files - problem

$
0
0

I am making really simple (and for fun) LangChain project.

A model can read PDF file and I can then ask him questions about specific PDF file.

Everything works fine (this is working example)

from PyPDF2 import PdfReaderfrom langchain.embeddings.openai import OpenAIEmbeddingsfrom langchain.text_splitter import CharacterTextSplitterfrom langchain.vectorstores import ElasticVectorSearch, Pinecone, Weaviate, FAISSfrom langchain.chains.question_answering import load_qa_chainfrom langchain.llms import OpenAIimport osos.environ["OPENAI_API_KEY"] = ""reader = PdfReader('./2023_GPT4All_Technical_Report.pdf')raw_text = ''for i, page in enumerate(reader.pages):    text = page.extract_text()    if text:        raw_text += textraw_text[:100]text_splitter = CharacterTextSplitter(            separator = "\n",    chunk_size = 1000,    chunk_overlap  = 200,    length_function = len,)texts = text_splitter.split_text(raw_text)embeddings = OpenAIEmbeddings(model='gpt-3.5-turbo')docsearch = FAISS.from_texts(texts, embeddings)chain = load_qa_chain(OpenAI(), chain_type="stuff")query = "Who is the author of the book?"docs = docsearch.similarity_search(query)res = chain.run(input_documents=docs, question=query)print(res) 

What do I see as a problem:

If I ask a some simple question like what is 2+2 he doesn't know.. How did I lost all the knowledge of a model? Is there a workaround that model has existing knowledge and I just add a knowledge of a specific pdf file?

Thanks to everyone for asnwers and I hope a good conversion will start from my question..

Also a suggestios would be awesome!


Viewing all articles
Browse latest Browse all 23160

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>