Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customizable aggregate_max_token_number #27

Closed
ThisIsDemetrio opened this issue Nov 27, 2024 · 2 comments
Closed

Customizable aggregate_max_token_number #27

ThisIsDemetrio opened this issue Nov 27, 2024 · 2 comments
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@ThisIsDemetrio
Copy link
Contributor

Currently, this RAG application allows to pass documents extracted from embeddings to a limit of 2000 token, as stated in this variable. This is done to ensure that the context window is never surpassed, and avoid to receive errors with code 429.

I wish this value would be configurable, in order to use higher values for newer models.

@ThisIsDemetrio ThisIsDemetrio added bug Something isn't working good first issue Good for newcomers labels Nov 27, 2024
@Rushikeshmagdum0379
Copy link

@ThisIsDemetrio @giulioroggero @TurtleARM @FedericoOldrini @lucascanna
Solution
To enhance flexibility and support newer models that require a higher token limit, I propose making the token limit configurable. By allowing this value to be set dynamically, users can adjust the limit based on the model’s capabilities or their specific requirements.

Approach
Configuration via Environment Variable:

Introduced an environment variable TOKEN_LIMIT to define the token limit dynamically.
If the environment variable is not set, it defaults to 2000 tokens.
import os
token_limit = int(os.getenv('TOKEN_LIMIT', 2000))
Configuration via JSON File:

Added the option to load the token limit from a config.json file, making it easier to adjust values across different environments.
{
"token_limit": 4000
}
import json

with open('config.json') as f:
config = json.load(f)
token_limit = config.get('token_limit', 2000)
Dynamic Token Limit through Parameters:

For cases where the token limit needs to be set on a per-request basis, I added a parameter to the function call.
def process_document(token_limit=2000):
# Your document processing logic
Supporting Newer Models:

Modified the application to work seamlessly with newer models that support higher token limits (e.g., GPT-4), by passing the updated max_tokens parameter in API calls.
response = openai.Completion.create(
engine="gpt-4",
prompt=document_text,
max_tokens=token_limit
)
Rate Limiting and Error Handling:

Implemented error handling to avoid API rate limit issues (HTTP 429) when making larger requests with more tokens.
Added a retry logic with exponential backoff to handle cases when the rate limit is hit.
Benefits
Flexibility: Users can now configure the token limit based on the model’s capability or their project requirements.
Scalability: Easier to upgrade or change models in the future without major code changes.
Performance: Avoids errors and improves the handling of larger documents, especially with models like GPT-4 that support larger token windows.
Example Usage
To set the token limit to 4000 tokens, you can either set the environment variable:

export TOKEN_LIMIT=4000
Or modify the config.json file:

{
"token_limit": 4000
}
Conclusion
This contribution enhances the RAG application’s flexibility by making the token limit configurable, ensuring it can scale with newer models that support larger context windows, and improving overall performance and error handling.

@ThisIsDemetrio
Copy link
Contributor Author

Closing this issue because, after a more detail analysis, the value is actually configurable. A couple of lines have been included in the documentation as well and will be released with v0.5.3 of the service.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants