[Custom Endpoint] [Bug] Seems like this problem is becoming more important - Refusal to get response from Custom Endpoint #3913
Replies: 4 comments 13 replies
-
Is your OpenAI-compliant API set up to stream chunks? Does it only return the final generation? If so, you need to set
The azure issue happens when users have upgraded the |
Beta Was this translation helpful? Give feedback.
-
same question |
Beta Was this translation helpful? Give feedback.
-
We currently struggle with a similar case. Streaming works perfectly and as soon as we change it to "legacy" with |
Beta Was this translation helpful? Give feedback.
-
I have a similar problem. My custom endpoint (FastAPI in Python) has the current OpenAI API specs implemented and returns valid responses with requests via curl or python requests. Here is an example of a chunk returned via the request: {
"id": "9e3fea36-3f36-4952-8a23-05f58f3922fa",
"choices": [
{
"index": 0,
"delta": {
"role": "assistant",
"content": "Hello! "
},
"logprobs": null,
"finish_reason": null
}
],
"created": 1739649951.749077,
"model": "meta-llama/Llama-3.2-1B-Instruct",
"service_tier": "1",
"system_fingerprint": null,
"object": "chat.completion",
"usage": {
"completion_tokens": null,
"prompt_tokens": null,
"total_tokens": null,
"completion_token_details": {}
}
} But whenever I try to call the API from LibreChat i get the same errors with empty responses. I see that my server is accepting the request and processing it. I also see that the response is generated. This is the error i get:
Are there any problems with my response format? Also it does not make any difference if i use forceprompts: true. Here is the code i use to stream the results back: stream_response = chat_model.stream_output(
prompt,
max_tokens=request.max_completion_tokens,
temperature=request.temperature
)
return StreamingResponse(
stream_response,
media_type="application/json"
) |
Beta Was this translation helpful? Give feedback.
-
Hi all,
For the past week, I've been trying to let Librechat docker connect with a custom endpoint that is OpenAI API compliant. Librechat front-end can successfully connect to the custom endpoint, the custom endpoint API is able to process the request and returns a JSON of the OpenAI's chat completion.
The issue arises when the custom endpoint returns the JSON. There are no errors from the API-side. Librechat seems to refuse to accept any chunks and throws the error
The custom endpoint is a simple fastapi/ollama application:
run.zip
I configured librechat.yaml and it seems correct because the frontend is in fact connecting with the backend and sending data but won't receive any.
the API file and librechat.yaml are both in run.zip
I believe this is a bug since i saw the same issue with azureOpenAI and an inactive discussion from 5 days ago.
Is there a way to fix for now?
Beta Was this translation helpful? Give feedback.
All reactions