I am creating chatbot on RDBMS database using LLM and Langchain using database agent.but the problem is langchain's databse agent is only converting natural language questions to query. it does not have entire data as a context in the form of vector data so it fails to answers many thing.
Problem with my approach :it does not maintain/remember conversation history, many questions is does not understand it properly. it only tries to convert question into query thats it.eg. if i ask analyse the abc product trend with respect to xyz product and tell me the reason why it is not profitable it will not answer these kind of questions.but if i pass some sample data to chatgpt in the form of csv data then it will easily answer these kind questions too.
my observation is langchains databse agent is not suitable for Conversational QnA type of problem. if somehow we can convert entire rdbms data into vector database then it will be very useful and it will have database knoledge in the form of vector so it will understand and answer our query well.
I saw most of the QnA chatbot examples which is working well is build on top of textdata, pdf data.
My Try :
import osimport streamlit as stfrom langchain.chat_models import ChatOpenAIfrom langchain.sql_database import SQLDatabasefrom langchain.prompts.chat import ChatPromptTemplatefrom langchain_experimental.sql import SQLDatabaseChain# Set the OpenAI API key as an environment variableos.environ["OPENAI_API_KEY"] = 'your-key'# Replace 'username', 'password', and 'database_name' with your MySQL credentialsdb = SQLDatabase.from_uri("mysql+pymysql://root:root@localhost/test")llm = ChatOpenAI(temperature=0.0, model="gpt-3.5-turbo")# Define the promptprompt = ChatPromptTemplate.from_messages( [ ("system",""" you are a very intelligent AI assistant who is an expert in identifying relevant questions from users and converting them into SQL queries to generate correct answers. Please use the below context to write the SQL queries. context: you must query against the connected database, Schema Information: The test database contains the following tables and columns: id: Unique identifier for each record in the dataset. prev_month_date: The date of the previous month. prev_month: Numeric representation of the previous month. prev_year: The year corresponding to the previous month. prev_Period: A combination of the previous year and month, often used as a period identifier. Customer_Number: Identifier for the customer associated with the transaction. Currency: Currency code used for the transactions. Product_Number: Identifier for the product associated with the transaction. Die_number: Identifier for the die used in the manufacturing process. Total_Invoice_Quantity: Quantity of products in the invoice for the given transaction. Selling_Price_in_INR: Selling price of the product in Indian Rupees. Raw_Material_Cost_in_INR: Cost of raw materials used in the manufacturing process in Indian Rupees. GM1_in_INR: Gross Margin 1 in Indian Rupees. Total_Forging_Cost_in_INR: Total forging cost in Indian Rupees. Total_Machining_Cost_in_INR: Total machining cost in Indian Rupees. GM2_in_INR: Gross Margin 2 in Indian Rupees. GM_2_Ratio_CBD_: Gross Margin 2 ratio. EBIDTA_in_INR: Earnings Before Interest, Depreciation, Taxes, and Amortization in Indian Rupees. Selling_Price_in_USD: Selling price of the product in US Dollars. PAT_in_INR: Profit After Tax in Indian Rupees. PBT_in_INR: Profit Before Tax in Indian Rupees. BoughtOut_and_SubContract_Cost_in_INR: Cost of bought-out and subcontracted services in Indian Rupees. Packaging_Cost_in_INR: Packaging cost in Indian Rupees. Energy_Cost_for_Forging_in_INR: Energy cost for forging in Indian Rupees. Stores_and_Maintenance_Cost_for_Forging_in_INR: Stores and maintenance cost for forging in Indian Rupees. Depreciation_Interest_Cost_for_Forging_INR: Depreciation and interest cost for forging in Indian Rupees. Painting_and_Coating_Cost_for_Forging_in_INR: Cost of painting and coating for forging in Indian Rupees. Die_Cost_for_Forging_in_INR: Cost of die for forging in Indian Rupees. Rejection_Cost_for_Forging_in_INR: Cost associated with rejection in forging in Indian Rupees. Energy_Cost_for_Machining_in_INR: Energy cost for machining in Indian Rupees. Stores_and_Maintenance_Cost_for_Machining_in_INR: Stores and maintenance cost for machining in Indian Rupees. Depreciation_Interest_Cost_for_Machining_INR: Depreciation and interest cost for machining in Indian Rupees. Rejection_Cost_for_Machining_in_INR: Cost associated with rejection in machining in Indian Rupees. Forwarding_Cost_in_INR: Forwarding cost in Indian Rupees. Logistics_Cost_in_INR: Logistics cost in Indian Rupees. Selling_General_and_Administrtaive_SGnA_Cost_in_INR: Selling, General, and Administrative (SG&A) cost in Indian Rupees. Working_Capital_Cost_in_INR: Working capital cost in Indian Rupees. Total_Subcontracting_Cost_in_INR: Total subcontracting cost in Indian Rupees. Direct_Manpower_Cost_for_Forging_in_INR: Direct manpower cost for forging in Indian Rupees. Direct_Manpower_Cost_for_Machining_in_INR: Direct manpower cost for machining in Indian Rupees. Indirect_Manpower_Cost_for_Forging_in_INR: Indirect manpower cost for forging in Indian Rupees. Indirect_Manpower_Cost_for_Machining_in_INR: Indirect manpower cost for machining in Indian Rupees. BoughtOut_Cost_in_INR: Cost of bought-out items in Indian Rupees. SubContract_Cost_for_Forging_in_INR: Subcontract cost for forging in Indian Rupees. SubContract_Cost_for_Machining_in_INR: Subcontract cost for machining in Indian Rupees. EBIDTA_Percentage: Percentage of Earnings Before Interest, Depreciation, Taxes, and Amortization. PBT_Percentage: Percentage of Profit Before Tax. Total_Manufacturing_Cost_INR: Total manufacturing cost in Indian Rupees. SubContract_Cost_in_INR: Subcontract cost in Indian Rupees. Domestic_SubContract_Cost_in_INR: Domestic subcontract cost in Indian Rupees. Sales_Organization: Sales organization associated with the transaction. Distribution_Channel: Distribution channel associated with the transaction. Product_Name: Name of the product associated with the transaction. Product_Category: Category of the product. Customer_Name: Name of the customer associated with the transaction. Customer_Currency_Code: Currency code used by the customer. Customer_Region: Region of the customer. Sector: Sector associated with the transaction. Business_Category: Business category associated with the transaction. Group_Leader_Name: Name of the group leader associated with the transaction. As an expert, you must use joins whenever required.""" ), ("human", "{question}\ ai: ") ])db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True)# Streamlit appst.title("SQL Query Generator")# Get user inputuser_question = st.text_input("Ask a question:")if st.button("Generate SQL Query"): # Run the question through the SQLDatabaseChain result = db_chain.run(prompt.format_prompt(question=user_question)) # st.write(f"Generated SQL Query: {result.SQLQuery}") st.write(f"AI Response: \n {result}")