Customizable `aggregate_max_token_number` #27

ThisIsDemetrio · 2024-11-27T17:00:28Z

Currently, this RAG application allows to pass documents extracted from embeddings to a limit of 2000 token, as stated in this variable. This is done to ensure that the context window is never surpassed, and avoid to receive errors with code 429.

I wish this value would be configurable, in order to use higher values for newer models.

Rushikeshmagdum0379 · 2024-11-27T17:12:02Z

@ThisIsDemetrio @giulioroggero @TurtleARM @FedericoOldrini @lucascanna
Solution
To enhance flexibility and support newer models that require a higher token limit, I propose making the token limit configurable. By allowing this value to be set dynamically, users can adjust the limit based on the model’s capabilities or their specific requirements.

Approach
Configuration via Environment Variable:

Introduced an environment variable TOKEN_LIMIT to define the token limit dynamically.
If the environment variable is not set, it defaults to 2000 tokens.
import os
token_limit = int(os.getenv('TOKEN_LIMIT', 2000))
Configuration via JSON File:

Added the option to load the token limit from a config.json file, making it easier to adjust values across different environments.
{
"token_limit": 4000
}
import json

with open('config.json') as f:
config = json.load(f)
token_limit = config.get('token_limit', 2000)
Dynamic Token Limit through Parameters:

For cases where the token limit needs to be set on a per-request basis, I added a parameter to the function call.
def process_document(token_limit=2000):
# Your document processing logic
Supporting Newer Models:

Modified the application to work seamlessly with newer models that support higher token limits (e.g., GPT-4), by passing the updated max_tokens parameter in API calls.
response = openai.Completion.create(
engine="gpt-4",
prompt=document_text,
max_tokens=token_limit
)
Rate Limiting and Error Handling:

Implemented error handling to avoid API rate limit issues (HTTP 429) when making larger requests with more tokens.
Added a retry logic with exponential backoff to handle cases when the rate limit is hit.
Benefits
Flexibility: Users can now configure the token limit based on the model’s capability or their project requirements.
Scalability: Easier to upgrade or change models in the future without major code changes.
Performance: Avoids errors and improves the handling of larger documents, especially with models like GPT-4 that support larger token windows.
Example Usage
To set the token limit to 4000 tokens, you can either set the environment variable:

export TOKEN_LIMIT=4000
Or modify the config.json file:

{
"token_limit": 4000
}
Conclusion
This contribution enhances the RAG application’s flexibility by making the token limit configurable, ensuring it can scale with newer models that support larger context windows, and improving overall performance and error handling.

ThisIsDemetrio · 2025-02-07T17:04:52Z

Closing this issue because, after a more detail analysis, the value is actually configurable. A couple of lines have been included in the documentation as well and will be released with v0.5.3 of the service.

ThisIsDemetrio added bug Something isn't working good first issue Good for newcomers labels Nov 27, 2024

ThisIsDemetrio closed this as completed Feb 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Customizable `aggregate_max_token_number` #27

Customizable `aggregate_max_token_number` #27

ThisIsDemetrio commented Nov 27, 2024

Rushikeshmagdum0379 commented Nov 27, 2024

ThisIsDemetrio commented Feb 7, 2025

Customizable aggregate_max_token_number #27

Customizable aggregate_max_token_number #27

Comments

ThisIsDemetrio commented Nov 27, 2024

Rushikeshmagdum0379 commented Nov 27, 2024

ThisIsDemetrio commented Feb 7, 2025

Customizable `aggregate_max_token_number` #27

Customizable `aggregate_max_token_number` #27