-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix streaming for OpenAI clients #1371
Conversation
|
||
return result | ||
|
||
def _should_gather_generator(self, request: starlette.requests.Request) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously the code to handle generators was copied twice, once for _predict
and once for any of the new OpenAI compatible endpoints. I opted to consolidate here, but that does open us up to risk by changing the behavior of existing predict calls.
I think it's very unlikely any existing user is relying on this edge case:
- Predict surfaces a generator / async generator
- Their client specifically accepts JSON and sends a user agent that contains OpenAI
- The user expects these results to return a synchronous predict response, rather than a streaming response
Therefore, I think it's better to consolidate logic, since it'll be easier to deprecate across the board once TaT rolls out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
# Despite requesting json, we should still stream results back. | ||
headers={ | ||
"accept": "application/json", | ||
"user-agent": "OpenAI/Python 1.61.0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the real user agent I get from using the Python SDK now
user_agent = request.headers.get("user-agent", "") | ||
if "openai" in user_agent.lower(): | ||
return False | ||
# TODO(nikhil): determine if we can safely deprecate this behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nikhil & I discussed live & we shoulid be able to deprecate this safely. AFAIK I have not heard of any user actually making use of this. (it's common to have a stream
parameter), and I think we haven't even documented this TBH. Artifact of something that we did when we first introduced streaming.
We can deprecate this once we have TaT (we can then advertise a version where this behavior will have changed).
|
||
return result | ||
|
||
def _should_gather_generator(self, request: starlette.requests.Request) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds good
# The OpenAI SDK sends an accept header for JSON even in a streaming context, | ||
# but we need to stream results back for client compatibility. | ||
user_agent = request.headers.get("user-agent", "") | ||
if "openai" in user_agent.lower(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe worth leaving an example user agent in the comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call!
35aabc4
to
1f23f23
Compare
🚀 What
When testing new endpoints with the OpenAI SDK, I found that the client sends an
accept: application/json
header even in streaming context, which breaks an assumption we make about how to handle those requests. It's hard to tell whether any existing truss relies on the functionality of converting a generator into a synchronous response based on the presence of this header, so to get around this temporarily we add a check to the user agent.This will specifically fix the case for the OpenAI SDK, but we'll have to test any other compatible clients to see what their behavior is for sending this header in streaming contexts.
💻 How
🔬 Testing
I deployed a model via engine builder and confirmed both streaming / non streaming capabilities work with the newest context builder image.
Also added a unit test, but that's more contrived.