-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Customizable aggregate_max_token_number
#27
Comments
@ThisIsDemetrio @giulioroggero @TurtleARM @FedericoOldrini @lucascanna Approach Introduced an environment variable TOKEN_LIMIT to define the token limit dynamically. Added the option to load the token limit from a config.json file, making it easier to adjust values across different environments. with open('config.json') as f: For cases where the token limit needs to be set on a per-request basis, I added a parameter to the function call. Modified the application to work seamlessly with newer models that support higher token limits (e.g., GPT-4), by passing the updated max_tokens parameter in API calls. Implemented error handling to avoid API rate limit issues (HTTP 429) when making larger requests with more tokens. export TOKEN_LIMIT=4000 { |
Closing this issue because, after a more detail analysis, the value is actually configurable. A couple of lines have been included in the documentation as well and will be released with v0.5.3 of the service. |
Currently, this RAG application allows to pass documents extracted from embeddings to a limit of 2000 token, as stated in this variable. This is done to ensure that the context window is never surpassed, and avoid to receive errors with code 429.
I wish this value would be configurable, in order to use higher values for newer models.
The text was updated successfully, but these errors were encountered: