-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sentence splitting implementation #26
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for approaching this, but I think it needs some clean up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my previous concerns have not been addressed yet.
Sorry, for not getting back earlier. Regarding the sentence splitting. What is the max chunk size respectively where is it settled. |
I have made some reasonable adjustments to tackle the large bodies. It is certainly not perfect, however it will not break the code and the embedding service will complain anyway if the payload is to big. Here is an excerpt of my test: ⚠ Sentence too long, splitting it into smaller parts |
Push split sentences to the front of the queue with array_unshift($sentences, ...$this->splitLongSentence($sentence, $tiktok));
278ef80
to
4e0adf2
Compare
To implement sentence splitting for sentences that are too long, we need to break them down into smaller parts while ensuring that the resulting chunks do not exceed the specified chunk size. Here's an updated version of the splitIntoChunks method that includes sentence splitting for overly long sentences: