-
Notifications
You must be signed in to change notification settings - Fork 551
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Optimistic" is_in generation for openai. #378
Conversation
except IndexError: | ||
pass | ||
|
||
greedy = False # we try to generate the full response at each iteration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might as well remove the greedy case altogether
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Say you have two tokenized choices [T1, TX]
, [T2, TX]
. You would unmask the three tokens and request 2
tokens from the API. If the system is very biased towards TX
, it may answer [TX, TX]
, and you have gained nothing. Repeating the same call you would enter in a loop. Thus, it makes sense to run a greedy step next with just T1
and T2
unmasked, such that you are guaranteed to move one token forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this makes sense.
It might take me until Monday to be able to properly review the PR, thank you for contributing! |
I pushed changes such that it now computes the prefix matches as sequences of tokens (list of ints) and not as strings. That makes the algorithm 100% consistent with the output generated by the LLM. The code may require some refactoring for readability/testability, but as for the algorithmic solution, I think it is complete. I won't push anything else. |
The current approach is greedy, in the sense that it generates a single token at each steps, asking the API to only generate valid next tokens. This mean having to pay for the prompt tokens for every token generated. This commit takes a more optimistic approach. It starts with allowing all tokens present in the sequences, and limiting the length of the generation to the number of tokens in the longest sequence. If the completion is not satisfactory it then takes one greedy step before switching back to the optimistic mode. On average this new approach consumes less tokens than the current one.
f70570c
to
fb1061a
Compare
This is a very good addition, thank you! I just rebased the commits into a single commit. I will refactor the entire OpenAI integration soon, so I will leave the code as is for now and merge. |
Addresses #368.
It is quite verbose: adds two helper functions:
longest_common_prefix
andget_choices_with_longest_common_prefix
.Maybe you would order the code differently.
ITo make the solution 100% secure, instead of computing the "prefixes" string-based, it should be done "token-based". That would change just those two functions. Let me know what you think.
The algorithm follows the pseudo-code in the issue.