-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: improve the Indexer #245
Conversation
Note(s) for PR Auther:
Note(s) for PR Reviewer(s):
|
* Index with batches to prevent rate limiting * Clean-up table after test run * Add more logs
4986911
to
6408e80
Compare
doc_indexer/src/utils/settings.py
Outdated
@@ -5,6 +5,8 @@ | |||
from decouple import Config, RepositoryEnv, config | |||
from dotenv import find_dotenv, load_dotenv | |||
|
|||
# TODO: re-use the settings parent project |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it todo suppose to be completed as part of this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be part of this issue. Do you want me to add the issue link to the comment too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the issue link too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the todo comment from the code then. Having it in the issue is enough.
doc_indexer/src/utils/settings.py
Outdated
DOCS_PATH = config("DOCS_PATH", default=None) | ||
DOCS_PATH = config("DOCS_PATH", default="data/output") | ||
DOCS_TABLE_NAME = config("DOCS_TABLE_NAME", default="kyma_docs") | ||
CHUNKS_BATCH_SIZE = config("CHUNKS_BATCH_SIZE", cast=int, default=100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 100
the best possible chunk size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
number of chunks is 474, that means there will 5 batches. I tested with 200 batch size. It was successful multiple times and increased it to 200. We can configure it later if we want to increase more.
9ae17e9
to
263a387
Compare
263a387
to
956a493
Compare
Description
Changes proposed in this pull request:
Related issue(s)
#199