-
Notifications
You must be signed in to change notification settings - Fork 551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restore the ability to draw multiple samples with Open Source models #416
Comments
I'm looking to implement beam search. I'm wondering whether we could simply use the sequence tokens as the ID of the sequence? 8b1ff9a#diff-f65ffb5f52b2e358c713ccb8f32a700769426c6c8b655f689e3cdccae07d22ac For 1000 token sequences, I can generate 25,000 keys on my machine per second, so it shouldn't be a substantial bottleneck. |
My understanding is that vLLM needs sequence ids because they're doing continuous batching, and we wouldn't need to assign an id to sequences here. I'm still hesitating between using one big tensor of shape from typing import List
class Sequence:
prompt_token_ids: List[int]
generated_token_ids: List[int]
logprob: float
@property
def token_ids(self):
return prompt_token_ids + generated_token_ids
def add_token_id(self, token_id, token_logprob):
logprob += token_logprob
self.generated_token_ids.append(token_id)
class Generation:
sequences: List[Sequence] For beam search, a This is the way vLLM does it, and they create new tensors at each step. We would need to determine the overhead of creating new tensors at each step before moving forward. What do you think? |
Would you mind copy/pasting your comment into a discussion and I'll answer there? Just so we stay on topic here and your comment is easier to find for future readers. |
Here are the changes that need to be implemented in order to restore the ability to generate several samples for each sequence:
We will need to add tests for |
This was removed in #366 to simplify the PR, and should be added again. This will require to be careful with the shape and an added dimension will need to be added to take sample shape into account. The mechanism implemented there can be re-used when implementing Beam Search #258
The text was updated successfully, but these errors were encountered: