Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle max_token more gracefully #433

Closed
rlouf opened this issue Dec 14, 2023 · 3 comments · Fixed by #451
Closed

Handle max_token more gracefully #433

rlouf opened this issue Dec 14, 2023 · 3 comments · Fixed by #451
Labels
enhancement structured generation Linked to structured generation

Comments

@rlouf
Copy link
Member

rlouf commented Dec 14, 2023

A bug was found in the way max_token is managed. Since the number of tokens generated is updated every time FSMState is called, when we generate tokens in batch the sequence generator returns max_tokens / batch_size tokens. The solution implemented in #391 consists in adding an idx argument to the next_state method of the FSMs, but this does not respect the a priori interface between the generator and FSMs.

There are two solutions to this problem:

  1. Create a FSM for each sequence in the batch.
  2. Handle the constraint in SequenceGenerator.

Either of these allows us to remove the idx argument. #417 suggests that the stop_at constraint should be implemented at the SequenceGenerator level (since we need to decode the token ids), and the fact that we should let users control the maximum number of tokens generated manually makes me lean towards (2).

@rlouf rlouf added enhancement structured generation Linked to structured generation labels Dec 14, 2023
@benlipkin
Copy link
Contributor

benlipkin commented Dec 14, 2023

A note here concerning the new CFGFSM class outlined in #391. That class also uses the idx argument to track the incremental parser progress individually for each batch sequence. Other upcoming features like SMC steering will also likely require state to be stored uniquely for each batch sequence (to avoid constant recalculation). This idx workaround is required for the current interface where the FSM is shared across all sequences in the batch, but I agree that it adds undue code complexity, and could be avoided. As such, I would lend further support for "Solution 1. Create a FSM for each sequence in the batch." This will add some startup overhead in initializing a large batch, but will make each generation step easier. And actually thinking a bit deeper, since each batch sequence is initialized to the same state, these overhead costs can be mostly amortized by constructing an FSM once and then tiling it, e.g., [FSM(*args)]*batch_size so we would be able to avoid multiplying overhead in producing the state transition dict, etc. I'm happy to sketch this out either as part of a new PR or in #391.

@rlouf
Copy link
Member Author

rlouf commented Dec 14, 2023

Makes sense. Let's merge #391 first.

@rlouf
Copy link
Member Author

rlouf commented Dec 20, 2023

An alternate solution that I implemented in the vLLM integration example was to initialize every sequence-dependent variable as a default dict and pass the sequence id. The solution of copying the FSM is much more elegant, and there's some work planned with the vLLM team to generate FSMs dynamically for each new sequence indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement structured generation Linked to structured generation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants