[Custom Endpoint] [Bug] Seems like this problem is becoming more important - Refusal to get response from Custom Endpoint #3913

WHKAH · 2024-09-04T03:39:15Z

WHKAH
Sep 4, 2024

Hi all,

For the past week, I've been trying to let Librechat docker connect with a custom endpoint that is OpenAI API compliant. Librechat front-end can successfully connect to the custom endpoint, the custom endpoint API is able to process the request and returns a JSON of the OpenAI's chat completion.

The issue arises when the custom endpoint returns the JSON. There are no errors from the API-side. Librechat seems to refuse to accept any chunks and throws the error

warn: [OpenAIClient.chatCompletion][stream] Unhandled error type
error: [OpenAIClient.chatCompletion] Unhandled error type request ended without sending any chunks
error: [handleAbortError] AI response error; aborting request: request ended without sending any chunks

The custom endpoint is a simple fastapi/ollama application:
run.zip

I configured librechat.yaml and it seems correct because the frontend is in fact connecting with the backend and sending data but won't receive any.

the API file and librechat.yaml are both in run.zip

I believe this is a bug since i saw the same issue with azureOpenAI and an inactive discussion from 5 days ago.

Is there a way to fix for now?

danny-avila · 2024-09-04T04:03:51Z

danny-avila
Sep 4, 2024
Maintainer

Unhandled error type request ended without sending any chunks

Is your OpenAI-compliant API set up to stream chunks? Does it only return the final generation? If so, you need to set forcePrompt, otherwise it will expect stream chunks:

https://www.librechat.ai/docs/configuration/librechat_yaml/object_structure/custom_endpoint#forceprompt

I believe this is a bug since i saw the same issue with azureOpenAI and an inactive discussion from 5 days ago.

The azure issue happens when users have upgraded the openai SDK outside of the main repo's current package in their forks.

7 replies

schnaker85 Oct 3, 2024

same happens here, librechat does not show any text and no error in the logs.

zachchan105 Feb 17, 2025

Any updates on this running into same issue just empty responses

zachchan105 Feb 17, 2025

Any updates on this running into same issue just empty responses

I used OpenAI python SDK and was able to debug the issue. forcePrompt is irrelevant for me now because I was a able to diagnose the chat completions route using this method. Everything is working fine now.

klannk Feb 17, 2025

Could you share what was the issue on your end?

zachchan105 Feb 18, 2025

Could you share what was the issue on your end?

I restructured my API routes. forcePrompt: true uses OpenAI's v1/completions (legacy), while forcePrompt: false uses v1/chat/completions with streaming enabled. The issue was that I wasn't properly generating results for streaming—I was collecting the full response first and using that....which doesn't work when Librechat is expecting chunks.

To debug, I tested my API route with this Python script:

from openai import OpenAI

# Initialize client and connect to custom endpoint
client = OpenAI(
    api_key="XXXXXXXXXXXXXXXXX",
    base_url="http://mycustomendpoint.com/api/openai/v1"  # Custom endpoint
)

print("Response (streaming):", flush=True)

# Call API with required 'model' parameter and enable streaming
response_stream = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-R1-Distill-Llama-70B",  # Any model
    messages=[
        {"role": "system", "content": "You are a Python coder!"},
        {"role": "user", "content": "Simple while loop that prints 'Hello, World!' 10 times."}
    ],
    stream=True,
    max_tokens=150
)

# Process streaming response
for chunk in response_stream:
    token = chunk.choices[0].delta.content
    if token:  # Ensure token is not None before printing
        print(token, end="", flush=True)

Once I confirmed this worked, I knew it would function in LibreChat. The key was ensuring my custom endpoint adhered to OpenAI's API standards by closely following the docs and replicating expected behavior in my custom routes. Definitely not efficient, but it works

iswangaiguo · 2024-09-23T15:21:21Z

iswangaiguo
Sep 23, 2024

same question

0 replies

schnaker85 · 2024-10-03T09:57:11Z

schnaker85
Oct 3, 2024

We currently struggle with a similar case. Streaming works perfectly and as soon as we change it to "legacy" with forcePrompt and the updated non-chunked response it simply does not show anything as response in UI.

4 replies

victorstorchan Dec 5, 2024

hello @schnaker85, I am currently facing the same issue:
when forcePrompt:true, I have a "blank" answer (nothing returned)
however, when forcePrompt:false, I have a [OpenAIClient.chatCompletion] Unhandled error type request ended without sending any chunks error.

May you provide an example of function so that it works?

@app.route('/v1/chat/completions', methods=['POST'])
def chat_completions():

for now, something like

@app.route('/v1/chat/completions', methods=['POST'])
def chat_completions():
    def generate_stream():
        # First chunk: Metadata and initial response
        yield json.dumps({
            "id": "chatcmpl-unique-id",
            "object": "chat.completion",
            "created": 1733327883,
            "model": "llama-3.2-3b-instruct",
            "choices": [
                {
                    "index": 0,
                    "delta": {"role": "assistant", "content": "Here is the start of the response"}
                }
            ]
        }) + "\n"

        # Simulated delay for intermediate chunks
        import time
        time.sleep(0.1)

        # Second chunk: Continuation of the response
        yield json.dumps({
            "choices": [
                {
                    "index": 0,
                    "delta": {"content": " and here is more content."}
                }
            ]
        }) + "\n"

        # Final chunk: Finish reason
        yield json.dumps({
            "choices": [
                {
                    "index": 0,
                    "finish_reason": "stop"
                }
            ]
        }) + "\n"

    # Return the response as a stream
    return Response(generate_stream(), content_type="application/json; charset=utf-8")

does not work for me

krishnachaitanya7 Jan 10, 2025

Here is what worked for me. First, the custom endpoints support OpenAI REST API standards. Then and then only LibreChat will recognize it. Whenever I sent responses from Ollama, it worked, but when I used to do the same using a Flask server, it failed to recognize the text. Moreover, Librechat expects Stream responses, not the normal JSON responses as output. Keeping these things in mind, below is my solution.

Install vllm and use it as an OpenAI REST API compatible server. This is because you will always send a dummy text to it, and sending it to a real OpenAI server is a waste of bandwidth. The installation and running of the server instruction can be found here.
Use the below code to send the required responses to LibreChat. First, below is my libreachat.yaml configuration (Please play around with aesthetics on how you want your bot to look like):

endpoints:
  custom:
    - name: "Ollama"
      apiKey: "ollama"
      # use 'host.docker.internal' instead of localhost if running LibreChat in a docker container
      baseURL: "http://0.0.0.0:11434/v1/chat/completions" 
      models:
        default: [
          "llama2",
          "mistral",
          "codellama",
          "dolphin-mixtral",
          "mistral-openorca"
          ]
        # fetching list of models is supported but the `name` field must start
        # with `ollama` (case-insensitive), as it does in this example.
        fetch: true
      titleConvo: true
      titleModel: "current_model"
      summarize: false
      summaryModel: "current_model"
      forcePrompt: false
      modelDisplayLabel: "Ollama Bot"

Below is the Python code

import json
from time import sleep

from flask import Flask
from flask import request
from openai import OpenAI
from flask import Response

app = Flask(__name__)

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="token-abc123",
)


def serialize_chunk(chunk):
    return {
        "id": chunk.id,
        "created": chunk.created,
        "model": chunk.model,
        "object": chunk.object,
        "system_fingerprint": chunk.system_fingerprint,
        "usage": chunk.usage,
        "choices": [
            {
                "finish_reason": chunk.choices[0].finish_reason,
                "function_call": chunk.choices[0].delta.function_call,
                "index": chunk.choices[0].index,
                "content": chunk.choices[0].delta.content,
                "role": chunk.choices[0].delta.role,
                "logprobs": chunk.choices[0].logprobs,
                "refusal": chunk.choices[0].delta.refusal,
                "delta": {
                    "content": chunk.choices[0].delta.content,
                    "role": chunk.choices[0].delta.role,
                },
            }
        ],
    }


@app.route("/v1/chat/completions", methods=["POST"])
def api_server():
    data = request.get_json()
    print(data)

    def generate():
        # Output from the dummy model
        completion = client.chat.completions.create(
            model="mistralai/Mistral-7B-Instruct-v0.2",
            messages=[{"role": "user", "content": "Hello!"}],
            response_format={"type": "json_object"},
            stream=True,
        )
        # Below is the string variable you can get from your own RAG/LLM, etc
        ans_string = "Hello! This is a test reply from the bot."
        for i, chunk in enumerate(completion):
            if i == 0:
                for each_char in ans_string:
                    chunk.choices[0].delta.content = each_char
                    sleep(0.1) # Simulating streaming
                    yield f"data: {json.dumps(serialize_chunk(chunk))}\n\n"
        chunk.choices[0].delta.content = ""
        yield f"data: {json.dumps(serialize_chunk(chunk))}\n\n"

    return Response(generate(), mimetype="text/event-stream")


if __name__ == "__main__":
    app.run(host="0.0.0.0", port=11434)

Please keep in mind the individual fields, such as chunk.model and chunk.created that are sent back to LibreChat may not be accurate but could be easily replaced with correct ones. Still, this solution should work if you aim to use LibreChat as a front end for your custom backend model and happy as long as your text appears in LibreChat.

klannk Feb 18, 2025

Thanks found the solution as well. My API was not sending SSEs by using FastAPIs StreamingResponse.

krishnachaitanya7 Feb 18, 2025

@klannk Merci à vous, I am glad it was helpful!

klannk · 2025-02-15T20:08:11Z

klannk
Feb 15, 2025

I have a similar problem. My custom endpoint (FastAPI in Python) has the current OpenAI API specs implemented and returns valid responses with requests via curl or python requests.

Here is an example of a chunk returned via the request:

{
    "id": "9e3fea36-3f36-4952-8a23-05f58f3922fa",
    "choices": [
        {
            "index": 0,
            "delta": {
                "role": "assistant",
                "content": "Hello! "
            },
            "logprobs": null,
            "finish_reason": null
        }
    ],
    "created": 1739649951.749077,
    "model": "meta-llama/Llama-3.2-1B-Instruct",
    "service_tier": "1",
    "system_fingerprint": null,
    "object": "chat.completion",
    "usage": {
        "completion_tokens": null,
        "prompt_tokens": null,
        "total_tokens": null,
        "completion_token_details": {}
    }
}

But whenever I try to call the API from LibreChat i get the same errors with empty responses. I see that my server is accepting the request and processing it. I also see that the response is generated.

This is the error i get:

2025-02-15T19:51:37.379Z warn: [OpenAIClient.chatCompletion][stream] Unhandled error type
2025-02-15T19:51:37.380Z error
2025-02-15T19:51:37.380Z error: [OpenAIClient.chatCompletion] Unhandled error type request ended without sending any chunks
2025-02-15T19:51:37.381Z error: [handleAbortError] AI response error; aborting request: request ended without sending any chunks

Are there any problems with my response format? Also it does not make any difference if i use forceprompts: true.

Here is the code i use to stream the results back:

stream_response = chat_model.stream_output(
            prompt,
            max_tokens=request.max_completion_tokens,
            temperature=request.temperature
            )
        return StreamingResponse(
            stream_response,
            media_type="application/json"
        )

2 replies

danny-avila Feb 17, 2025
Maintainer

Yes, there are issues. As already suggested, you should test with the openai SDK in a JS/TS test script to ensure your API is truly OpenAI-compliant.

klannk Feb 18, 2025

@danny-avila Thank you.

For anyone stumbling upon this. I found the solution by debugging this from the SDK. My API response format was indeed already fully OpenAI-compliant.

I found that the FastAPI StreamingResponse is not sending server side events, which does not matter when you are accessing the stream by python requests or other means. However, the OpenAI SDKs in JS/TS and Python expect SSE responses from the server. To mitigate I used sse-starlette to send SSEs from FastAPI:

# add other imports here
from sse_starlette.sse import EventSourceResponse


# Setup App

async def event_generator(generator):
    for element in generator:
        yield element.model_dump_json()
    yield "[DONE]"
               
@app.post("/api/openai/v1/chat/completions")
async def stream_response(request: ChatCompletionRequest, api_key: str = Depends(verify_api_key)):
    if request.stream:
        stream_response = chat_model.stream_output(
            prompt,
            max_tokens=request.max_completion_tokens,
            temperature=request.temperature,
            )
        return EventSourceResponse(event_generator(stream_response))

Thanks everyone for the suggestions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Custom Endpoint] [Bug] Seems like this problem is becoming more important - Refusal to get response from Custom Endpoint #3913

{{title}}

Replies: 4 comments 13 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

[Custom Endpoint] [Bug] Seems like this problem is becoming more important - Refusal to get response from Custom Endpoint #3913

Replies: 4 comments · 13 replies

danny-avila Sep 4, 2024 Maintainer

danny-avila Feb 17, 2025 Maintainer

Replies: 4 comments 13 replies

danny-avila
Sep 4, 2024
Maintainer

danny-avila Feb 17, 2025
Maintainer