i have the following df:
import pandas as pd# Creating the FAQ datafaq_data = {"Questions": ["How can I reset my company email password?","What should I do if my computer is running slow?","How do I access the company's VPN?","What are the office Wi-Fi settings?","How do I report a phishing email?" ],"Answers": ["To reset your email password, visit the company's email portal and click on 'Forgot Password.' Follow the prompts to reset your password. If you encounter issues, contact IT support.","Start by restarting your computer. If it's still slow, check for any running programs that are not in use and close them. If the problem persists, contact IT for a diagnostic check.","Install the VPN client from the internal IT portal. Use your company credentials to log in. For detailed instructions, refer to the VPN access guide on the company intranet.","The office Wi-Fi network is named '[CompanyNetwork].' The password is regularly updated and can be found on the internal IT page or by contacting IT support directly.","Forward the suspicious email to the IT security team at [security@company.com]. Do not click on any links or download attachments from the email.", ]# Creating a DataFramefaq_df = pd.DataFrame(faq_data)
I created a simple code to do semantic search using native chroma functions as below:
import pandas as pdimport chromadbfrom sentence_transformers import SentenceTransformer, util# Load the ISC locally saved CSV file into a pandas DataFramefile_path = 'IT_Operations_FAQ.csv'df = pd.read_csv(file_path, encoding='ISO-8859-1')df['meta'] = df.apply(lambda x: {'Questions': x['Questions'], 'Answers': x['Answers']}, axis=1)# Chroma DB Setupchroma_client = chromadb.PersistentClient(path="rfp_faq_chroma_db")# Check if collection exists and delete it to start fresh (optional based on requirement)collection_name = "rfp-faq"if chroma_client.get_or_create_collection(collection_name): chroma_client.delete_collection(name=collection_name)# Collection creation or retrievalrfp_faq_collection = chroma_client.create_collection( name=collection_name, metadata={"hnsw:space": "cosine"} # l2 is the default, but using cosine for this example)# Load the locally saved sentence transformer model for embeddingmodelPath = "model/"model = SentenceTransformer(modelPath)dataset_embeddings = model.encode(df['Questions'].tolist(), show_progress_bar=True)# Inserting data into the collectionrfp_faq_collection.upsert( embeddings=dataset_embeddings, ids=[f"{x}" for x in df.index.tolist()], documents=df['Questions'].tolist(), metadatas=df['meta'].tolist() )# Query the collection# user_question = "How do I set up a new employee's workstation?"# Take user input for the questionuser_question = input("Please enter your question: ")qry_embeddings = model.encode([user_question]).tolist() # Encoding the query stringquery_output = rfp_faq_collection.query(qry_embeddings, n_results=2)# print(query_output)# Constructing a DataFrame from the query resultsids = query_output['ids'][0]questions = [md['Questions'] for md in query_output['metadatas'][0]]answers = [md['Answers'] for md in query_output['metadatas'][0]]distances = query_output['distances'][0]# embeddings = [None] * len(questions) # Since 'embeddings' is None in the provided data# DataFrame with query resultsfaq_df = pd.DataFrame({'actual_row_no':ids,'Questions': questions,'Answers': answers,'distances': distances,# 'embeddings': embeddings})# Display the DataFramefaq_df
How can i modify this to use langchain +chroma?