I am going crazy with finding a solution to this. I am working with a proxy server for OpenAI models. I'm using an ssh tunnel to hit the server on my localhost.
from openai import OpenAIclient = OpenAI(base_url="http://localhost:8000", api_key="sk-xxx")response = client.chat.completions.create(model="gpt_35_turbo", messages = [ {"role": "user","content": "this is a test request, write a short poem" }])print(response)This works perfectly. I need to use this with llamaindex, so I need to define my llm and embedding_model, and serve them to my service_context like so:
llm = OpenAI(model="text-davinci-003", temperature=0, max_tokens=256)embed_model = OpenAIEmbedding()text_splitter = SentenceSplitter(chunk_size=1024, chunk_overlap=20)prompt_helper = PromptHelper( context_window=4096, num_output=256, chunk_overlap_ratio=0.1, chunk_size_limit=None,)service_context = ServiceContext.from_defaults( llm=llm, embed_model=embed_model, text_splitter=text_splitter, prompt_helper=prompt_helper,)where I would be calling my own "ada-02" model through the server as well. How can I make this work with my setup? I am totally unable to find any answer to this anywhere and I've already wasted days trying to fix it.