Fix streaming for OpenAI clients #1371

nnarayen · 2025-02-07T00:58:02Z

🚀 What

When testing new endpoints with the OpenAI SDK, I found that the client sends an accept: application/json header even in streaming context, which breaks an assumption we make about how to handle those requests. It's hard to tell whether any existing truss relies on the functionality of converting a generator into a synchronous response based on the presence of this header, so to get around this temporarily we add a check to the user agent.

This will specifically fix the case for the OpenAI SDK, but we'll have to test any other compatible clients to see what their behavior is for sending this header in streaming contexts.

💻 How

🔬 Testing

I deployed a model via engine builder and confirmed both streaming / non streaming capabilities work with the newest context builder image.

Also added a unit test, but that's more contrived.

nnarayen · 2025-02-07T01:01:25Z

truss/templates/server/model_wrapper.py


        return result

+    def _should_gather_generator(self, request: starlette.requests.Request) -> bool:


Previously the code to handle generators was copied twice, once for _predict and once for any of the new OpenAI compatible endpoints. I opted to consolidate here, but that does open us up to risk by changing the behavior of existing predict calls.

I think it's very unlikely any existing user is relying on this edge case:

Predict surfaces a generator / async generator

Their client specifically accepts JSON and sends a user agent that contains OpenAI

The user expects these results to return a synchronous predict response, rather than a streaming response

Therefore, I think it's better to consolidate logic, since it'll be easier to deprecate across the board once TaT rolls out.

sounds good

nnarayen · 2025-02-07T01:32:41Z

truss/tests/test_model_inference.py

+            # Despite requesting json, we should still stream results back.
+            headers={
+                "accept": "application/json",
+                "user-agent": "OpenAI/Python 1.61.0",


This is the real user agent I get from using the Python SDK now

squidarth · 2025-02-07T16:23:00Z

truss/templates/server/model_wrapper.py

+        user_agent = request.headers.get("user-agent", "")
+        if "openai" in user_agent.lower():
+            return False
+        # TODO(nikhil): determine if we can safely deprecate this behavior.


Nikhil & I discussed live & we shoulid be able to deprecate this safely. AFAIK I have not heard of any user actually making use of this. (it's common to have a stream parameter), and I think we haven't even documented this TBH. Artifact of something that we did when we first introduced streaming.

We can deprecate this once we have TaT (we can then advertise a version where this behavior will have changed).

squidarth · 2025-02-07T16:23:05Z

truss/templates/server/model_wrapper.py


        return result

+    def _should_gather_generator(self, request: starlette.requests.Request) -> bool:


sounds good

squidarth · 2025-02-07T16:23:58Z

truss/templates/server/model_wrapper.py

+        # The OpenAI SDK sends an accept header for JSON even in a streaming context,
+        # but we need to stream results back for client compatibility.
+        user_agent = request.headers.get("user-agent", "")
+        if "openai" in user_agent.lower():


maybe worth leaving an example user agent in the comment here.

nnarayen requested review from marius-baseten and squidarth February 7, 2025 00:58

nnarayen commented Feb 7, 2025

View reviewed changes

squidarth approved these changes Feb 7, 2025

View reviewed changes

squidarth reviewed Feb 7, 2025

View reviewed changes

nnarayen added 2 commits February 7, 2025 11:32

Fix streaming for OpenAI clients

3e2290d

Add user agent example

1f23f23

nnarayen force-pushed the nikhil/support-openai-streaming branch from 35aabc4 to 1f23f23 Compare February 7, 2025 16:33

nnarayen merged commit d5db001 into main Feb 7, 2025
5 checks passed

nnarayen deleted the nikhil/support-openai-streaming branch February 7, 2025 16:48

This was referenced Feb 7, 2025

Fix integration test for span cleanup #1374

Merged

Update deferred semaphore docstring #1375

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix streaming for OpenAI clients #1371

Fix streaming for OpenAI clients #1371

nnarayen commented Feb 7, 2025

nnarayen Feb 7, 2025

squidarth Feb 7, 2025

nnarayen Feb 7, 2025

squidarth Feb 7, 2025

squidarth Feb 7, 2025

squidarth Feb 7, 2025

nnarayen Feb 7, 2025


		return result

		def _should_gather_generator(self, request: starlette.requests.Request) -> bool:

Fix streaming for OpenAI clients #1371

Fix streaming for OpenAI clients #1371

Conversation

nnarayen commented Feb 7, 2025

🚀 What

💻 How

🔬 Testing

nnarayen Feb 7, 2025

Choose a reason for hiding this comment

squidarth Feb 7, 2025

Choose a reason for hiding this comment

nnarayen Feb 7, 2025

Choose a reason for hiding this comment

squidarth Feb 7, 2025

Choose a reason for hiding this comment

squidarth Feb 7, 2025

Choose a reason for hiding this comment

squidarth Feb 7, 2025

Choose a reason for hiding this comment

nnarayen Feb 7, 2025

Choose a reason for hiding this comment