Modified sentence_tokenize to handle tokeniztion of sentence which ends with numerics. #72

varunkatiyar819 · 2024-06-07T07:14:40Z

Previously the sentence was not able to tokenize the sentence, if the sentence ends with numeric character which i guess was a logic issue for checking and not splitting for decimal number.
I have changed logic a bit, specifically for not tokenizing in that condition.
So for the sentence - "India was declared a nation with its own constitution on 26 January 1950, while India gained independence on 14 August 1947. About 3 years went through the formation of the nation and the complete departure of the British." it's working fine also tested a few edge cases too.

…ds with numerics.

Modified sentence_tokenize to handle tokeniztion of sentence which en…

e0596ef

…ds with numerics.

varunkatiyar819 marked this pull request as draft June 7, 2024 17:28

varunkatiyar819 marked this pull request as ready for review June 7, 2024 17:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modified sentence_tokenize to handle tokeniztion of sentence which ends with numerics. #72

Modified sentence_tokenize to handle tokeniztion of sentence which ends with numerics. #72

varunkatiyar819 commented Jun 7, 2024

Modified sentence_tokenize to handle tokeniztion of sentence which ends with numerics. #72

Are you sure you want to change the base?

Modified sentence_tokenize to handle tokeniztion of sentence which ends with numerics. #72

Conversation

varunkatiyar819 commented Jun 7, 2024