So, I am trying to build a streamlit RAG app that extracts information from a URL and learns from it. Then, a user can ask the model questions pertaining to the article in the URL and the model will provide a suitable answer.
I did this on my notebook and it works perfectly, only to encounter a IndexError: list index out of range error in my streamlit app. I made use of GoogleGenerativeAIEmbeddings with FAISS.
This is the block of code:
main_placeholder = sl.empty() llm = ChatGoogleGenerativeAI(model = 'gemini-pro') if process_url_clicked: loader = UnstructuredURLLoader(urls = urls) main_placeholder.text("Data loading...started...✅✅✅") data = loader.load() text_splitter = RecursiveCharacterTextSplitter( separators = ['\n','\n\n','.',','], chunk_size = 1000, chunk_overlap = 200 ) main_placeholder.text("Text splitter...started...✅✅✅") docs = text_splitter.split_documents(data) embeddings = GoogleGenerativeAIEmbeddings(model = 'models/embedding-001') vector_index = FAISS.from_documents(docs,embeddings)And this the traceback from the streamlit app:
IndexError: list index out of rangeTraceback:File "C:\Python312\Lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 584, in _run_scriptexec(code, module.dict)File "C:\Users\owner\Desktop\Projects\nlp\main.py", line 84, in vectorstore_openai = FAISS.from_documents(docs, embeddings)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "C:\Python312\Lib\site-packages\langchain_core\vectorstores.py", line 550, in from_documentsreturn cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs)^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File "C:\Python312\Lib\site-packages\langchain_community\vectorstores\faiss.py", line 931, in from_textsreturn cls.__from(^^^^^^^^^^^File "C:\Python312\Lib\site-packages\langchain_community\vectorstores\faiss.py", line 888, in __fromindex = faiss.IndexFlatL2(len(embeddings[0]))~~~~~~~~~~^^^
This works perfectly in my notebook and I am seriously confused why this is happening.