Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix streaming for OpenAI clients #1371

Merged
merged 2 commits into from
Feb 7, 2025
Merged

Conversation

nnarayen
Copy link
Contributor

@nnarayen nnarayen commented Feb 7, 2025

🚀 What

When testing new endpoints with the OpenAI SDK, I found that the client sends an accept: application/json header even in streaming context, which breaks an assumption we make about how to handle those requests. It's hard to tell whether any existing truss relies on the functionality of converting a generator into a synchronous response based on the presence of this header, so to get around this temporarily we add a check to the user agent.

This will specifically fix the case for the OpenAI SDK, but we'll have to test any other compatible clients to see what their behavior is for sending this header in streaming contexts.

💻 How

🔬 Testing

I deployed a model via engine builder and confirmed both streaming / non streaming capabilities work with the newest context builder image.

Also added a unit test, but that's more contrived.


return result

def _should_gather_generator(self, request: starlette.requests.Request) -> bool:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously the code to handle generators was copied twice, once for _predict and once for any of the new OpenAI compatible endpoints. I opted to consolidate here, but that does open us up to risk by changing the behavior of existing predict calls.

I think it's very unlikely any existing user is relying on this edge case:

  • Predict surfaces a generator / async generator
  • Their client specifically accepts JSON and sends a user agent that contains OpenAI
  • The user expects these results to return a synchronous predict response, rather than a streaming response

Therefore, I think it's better to consolidate logic, since it'll be easier to deprecate across the board once TaT rolls out.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

# Despite requesting json, we should still stream results back.
headers={
"accept": "application/json",
"user-agent": "OpenAI/Python 1.61.0",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the real user agent I get from using the Python SDK now

user_agent = request.headers.get("user-agent", "")
if "openai" in user_agent.lower():
return False
# TODO(nikhil): determine if we can safely deprecate this behavior.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nikhil & I discussed live & we shoulid be able to deprecate this safely. AFAIK I have not heard of any user actually making use of this. (it's common to have a stream parameter), and I think we haven't even documented this TBH. Artifact of something that we did when we first introduced streaming.

We can deprecate this once we have TaT (we can then advertise a version where this behavior will have changed).


return result

def _should_gather_generator(self, request: starlette.requests.Request) -> bool:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good

# The OpenAI SDK sends an accept header for JSON even in a streaming context,
# but we need to stream results back for client compatibility.
user_agent = request.headers.get("user-agent", "")
if "openai" in user_agent.lower():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe worth leaving an example user agent in the comment here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call!

@nnarayen nnarayen force-pushed the nikhil/support-openai-streaming branch from 35aabc4 to 1f23f23 Compare February 7, 2025 16:33
@nnarayen nnarayen merged commit d5db001 into main Feb 7, 2025
5 checks passed
@nnarayen nnarayen deleted the nikhil/support-openai-streaming branch February 7, 2025 16:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants