-
-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving the Diagonal Calculation #21
Comments
Not sure if I am talking rubbish, but you may want to consider the single row version |
Hey @RobSchoenaker - thanks for the link however Quickenshtein already uses that optimization and a bunch more.
This particular issue is referring to a diagonal calculation approach which allows us to take advantage of SIMD (watch my video or that last blog post if you want to know a bit about SIMD). Basically diagonally calculating allows us to remove the dependency issue when calculating multiple cells at once. We do require 2x the memory that the traditional single row needs but for long strings, we absolutely crush it in speed. I do plan to have a few more blog posts about Levenshtein Distance in the future too as there are definitely some cool things I'd like to check out with it. |
@Turnerj thx for the quick response. I see my comment did not make any sense :-) |
No problem mate! Hopefully you find some of the links I've given interesting. |
A few points:
Preliminary calculations on performance for single threaded improvement (take these with a grain of salt, they all don't calculate correctly):
No target lookups and no shuffles/permutations aka. Process Column (at best - we would still need one of these calls per vector but we would avoid a lot of duplicates)
No source lookups aka. Process Row (at best - we would still need one of these calls per vector but we would avoid a lot of duplicates)
No source or target lookups (INVALID)
For comparison, it takes about 26.8 ms to calculate it currently with the regular diagonal calculation. This means at best, single threaded could have a 12% improvement however it is likely to be closer to 8% for the "correct" solution.
No idea about how the threading code would perform though.
The text was updated successfully, but these errors were encountered: