-
Notifications
You must be signed in to change notification settings - Fork 551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle max_token
more gracefully
#433
Comments
A note here concerning the new |
Makes sense. Let's merge #391 first. |
An alternate solution that I implemented in the vLLM integration example was to initialize every sequence-dependent variable as a default dict and pass the sequence id. The solution of copying the FSM is much more elegant, and there's some work planned with the vLLM team to generate FSMs dynamically for each new sequence indeed. |
A bug was found in the way
max_token
is managed. Since the number of tokens generated is updated every timeFSMState
is called, when we generate tokens in batch the sequence generator returnsmax_tokens / batch_size
tokens. The solution implemented in #391 consists in adding anidx
argument to thenext_state
method of the FSMs, but this does not respect the a priori interface between the generator and FSMs.There are two solutions to this problem:
SequenceGenerator
.Either of these allows us to remove the
idx
argument. #417 suggests that thestop_at
constraint should be implemented at theSequenceGenerator
level (since we need to decode the token ids), and the fact that we should let users control the maximum number of tokens generated manually makes me lean towards (2).The text was updated successfully, but these errors were encountered: