perf: reduce string comparisons in Levenshtein distance calculation #266

ramsyana · 2024-12-19T13:29:32Z

Optimize Levenshtein distance algorithm performance

Reducing the total number of string comparisons by half
Making string character access more efficient
Moving some calculations to compile time

Key changes:

Changed string comparison loop to avoid duplicate checks
Added character caching for faster access
Used compile-time initialization for the first row

perf: optimize Levenshtein distance implementation - Move first row initialization to compile time using inline while - Cache characters to reduce array access - Use @intFromBool for cost calculation - Eliminate redundant string comparisons by starting j at i+1 - Use maxInt(usize) instead of -1 for min_distance - Improve type safety by using usize consistently Performance: Reduces number of comparisons from n*(n-1) to n*(n-1)/2

PEZ · 2024-12-19T15:03:25Z

Hi! I don't think we are allowed to do 1 and 3.

The output of this version is

❯ ./zig/code `cat input.txt`                                   
times: 1953
min_distance: 7

Which is only half of the work the reference implementations do.

The benchmark tries to measure how fast a language does some work, and it's probably also cheating to do the work beforehand. (But @bddicken be the judge!)

Making string access more efficient should be perfectly fine, though. Do you know how much performance that unlocks?

Optimize: - Increase buffer size from 256 to 1024 bytes for string comparison arrays - Eliminate redundant comparisons by starting inner loop from i+1 The changes reduce the number of comparisons needed while allowing for longer string inputs to be processed.

ramsyana · 2024-12-20T06:53:42Z

Hi @PEZ thanks. I've updated the code to increase the buffer size and optimize string comparisons by eliminating redundant pair checks. This should improve performance while handling longer strings.

# original zig code
times: 3906
min_distance: 7

# optimized
times: 1953
min_distance: 7

PEZ · 2024-12-20T08:03:58Z

Thanks. The thing is that the reference implementation does not skip the redundant checks. Maybe it should. @Gkodkod, was there any reason you skipped that? CC: @bddicken

Ichoran · 2024-12-22T20:53:40Z

Part of the art of simple benchmarks is to find redundant busywork that you know is pointless but which the compiler can't discern is pointless. So it makes sense to me to leave the redundant calculations in. The benchmark already runs too fast for the results to be meaningful for languages that have a runtime engine with non-negligible startup time (e.g. JVM).

Gkodkod · 2024-12-25T21:53:19Z

Hey @PEZ Not sure what you mean, as in my original code I had them and saw you experimented with taking them out. Not sure why. Both @ramsyana and @Ichoran points are valid. Agree with you that @bddicken should make the call. Keep warm and safe. Have fun with your family during the holidays!

Pez.Zig.-.Languages.mp4

Update code.zig

b6c502f

Optimize: - Increase buffer size from 256 to 1024 bytes for string comparison arrays - Eliminate redundant comparisons by starting inner loop from i+1 The changes reduce the number of comparisons needed while allowing for longer string inputs to be processed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: reduce string comparisons in Levenshtein distance calculation #266

perf: reduce string comparisons in Levenshtein distance calculation #266

ramsyana commented Dec 19, 2024

PEZ commented Dec 19, 2024

ramsyana commented Dec 20, 2024 •

edited

Loading

PEZ commented Dec 20, 2024

Ichoran commented Dec 22, 2024

Gkodkod commented Dec 25, 2024

perf: reduce string comparisons in Levenshtein distance calculation #266

Are you sure you want to change the base?

perf: reduce string comparisons in Levenshtein distance calculation #266

Conversation

ramsyana commented Dec 19, 2024

PEZ commented Dec 19, 2024

ramsyana commented Dec 20, 2024 • edited Loading

PEZ commented Dec 20, 2024

Ichoran commented Dec 22, 2024

Gkodkod commented Dec 25, 2024

ramsyana commented Dec 20, 2024 •

edited

Loading