Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better splits; don't count everything as a word #9

Open
camilstaps opened this issue Nov 13, 2016 · 0 comments
Open

Better splits; don't count everything as a word #9

camilstaps opened this issue Nov 13, 2016 · 0 comments

Comments

@camilstaps
Copy link
Member

At the moment, we only split the text on spaces, but we should also split on ־ and ׃ and possibly more signs.

Furthermore, some things should not be considered a word:

  • פ and ס at the end of a verse
  • ׀ (still unclear to me what this actually means)

And then ketiv-qere should be handled better, e.g. (Ps. 119:161) [וּמִדְּבָרֶיךָ כ] (וּ֝מִדְּבָרְךָ֗ ק) breaks as four words, of which ] and ) are part. The characters [, ], (, ) and כ and ק in this context should not be considered part of a word.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant