Optimize string repetition and string copying #2634

advikkabra · 2024-03-28T17:15:57Z

The current string repetition algorithm was inefficient, where memset could be used for single char strings, and binary exponentiation could be used for multi char strings. This is faster than copying one byte at a time.
_lfortran_strcpy was calling strlen in every loop iteration, which was slowing down the program significantly.

After this change, a 1e6 repetition as shown in the issue is 0.04s, which is comparable to CPython.

kmr-srbh · 2024-03-28T17:36:35Z

Advik, I recognize and appreciate the effort you have put in to find and resolve this issue. I fear, we have increased the time instead.

N = 10 ⁶

string1: str = "a"
string2: str = string1 * 10**6

(base) saurabh-kumar@Awadh:~/Projects/System/lpython$ time ./src/bin/lpython ./examples/example.py
^C
real    1m43.633s
user    1m43.501s
sys     0m0.048s

The operation took around 30s before. I request you to look into it.

kmr-srbh · 2024-03-28T17:43:07Z

I think you might have mistakenly tested the above using CPython.

advikkabra · 2024-03-28T17:47:44Z

It works fine on my machine, and after profiling more than 99% of the time used was in the strlen calls in the strcpy function.

advik@advik-Inspiron-7560:~/dev/oss/lpython$ time src/bin/lpython ../test.py
1000000

real    0m0.082s
user    0m0.058s
sys     0m0.020s
advik@advik-Inspiron-7560:~/dev/oss/lpython$ cat ../test.py
string1: str = "a";
string2: str = string1 * 10**6
print(len(string2))

kmr-srbh · 2024-03-28T18:02:52Z

Great work Advik! It works properly on my machine now too! That's an outright 300x improvement! 👏

(lp) saurabh-kumar@Awadh:~/Projects/System/lpython$ time ./src/bin/lpython ./examples/example.py

real    0m0.110s
user    0m0.071s
sys     0m0.039s

The reason it did not work the previous time was because I was on an older branch. The new changes from LFortran have done an amazing job with increasing the speed.

Shaikh-Ubaid · 2024-03-29T01:22:53Z

src/libasr/runtime/lfortran_intrinsics.c

+    if (s_len == 1) {
+        memset(dest_char, *(*s), n);
+    } else {


I wonder why we need to handle the case for s_len == 1 separately. The else portion seems general, so I think that should be enough to handle the case of s_len == 1.

Does the else portion fail when s_len == 1?

It does not fail when s_len == 1, but memset is very optimized for this usecase. Setting a large block of data to a single value is what memset excels in, and it is optimized for every architecture. While testing, these are the compared times:

100000000 Binary exponentiation: 0.086963 memset: 0.006341

400000000 Binary exponentiation: 0.211737 memset: 0.018612

This is after compiling with -O3

Shaikh-Ubaid · 2024-03-29T02:21:51Z

src/libasr/runtime/lfortran_intrinsics.c

+        x[0][i] = y[i];
+    }
+    for (; i < x_len; i++) {
+        x[0][i] = ' ';


This is a great improvement. Thanks!

Shaikh-Ubaid

I think we should add a test for this with a large n (so that we are sure that the time taken does not break in future). We can do it in a separate PR.

It looks good to me. Thanks for this. Great work!

Shaikh-Ubaid · 2024-04-03T06:58:14Z

It does not fail when s_len == 1, but memset is very optimized for this usecase.

Did you trying checking this if the else works for s_len == 1? It is fine to have a separate case for s_len == 1, but I just want to be sure that the else works perfectly (for example it works for odd numbers like 1).

Shaikh-Ubaid · 2024-04-03T06:58:48Z

Let's merge this after we are sure that the else case works for odd numbers like 1.

advikkabra · 2024-04-03T07:45:04Z

This test is already there in test_str_01.py in integration_tests.

def test_str_repeat():
    a: str
    a = "Xyz"
    assert a*3 == "XyzXyzXyz"
    assert a*2*3 == "XyzXyzXyzXyzXyzXyz"
    assert 3*a*3 == "XyzXyzXyzXyzXyzXyzXyzXyzXyz"
    assert a*-1 == ""

Shaikh-Ubaid · 2024-04-03T08:31:01Z

This test is already there in test_str_01.py in integration_tests.

Do we have a test with a large n like (10^6 or so)? If so, then it is good.

          I think we should add a test for this with a large `n` (so that we are sure that the time taken does not break in future). We can do it in a separate PR.

Originally posted by @Shaikh-Ubaid in #2634 (review)

advikkabra · 2024-04-03T13:55:46Z

How would you suggest I add this as a test? Is there a way to check if something is taking too long?

Shaikh-Ubaid · 2024-04-03T14:04:47Z

How would you suggest I add this as a test? Is there a way to check if something is taking too long?

I think not yet in LPython (for LFortran we have cpu_time). For now, by adding a test we are just trying to ensure that the test is not taking too long.

Previously (as of the main branch) the time complexity of _lfortran_strcpy() was O(n^2). That means, for n = 10^6 the time taken could be in orders of n^12, which would never practically complete at the CI.

If we add a test for n = 10^6 and the test completes in a finite time (I am assuming it will take < 1s. If it take more than 1s then it is not much helpful and we should reduce n), then it verifies/ensures that the time complexity of _lfortran_strcpy() has improved.

Shaikh-Ubaid · 2024-04-03T14:07:01Z

For a robust test, we should support something like time.time(). But it should not be part of this PR and can be done later.

Once we have support for time.time(), we can later come back to the above n = 10^6 test case and update it to add a check for the time taken.

… into optimize-repeat

advikkabra · 2024-04-03T19:35:31Z

I have added a test for a 10**6 repetition. This would take >1 minute in the previous code, so it would be easy to catch if it goes wrong. Right now it completes it in 20ms on my machine, so it is comfortable with the new implementation. I think the PR is ready now.

Shaikh-Ubaid

Perfect! It looks good to me. Thanks for this. Great work!

* Faster string repeat algorithm * Optimize `_lfortran_strcpy` * Faster string repeat algorithm * Optimize `_lfortran_strcpy` * Added stress test for repeat

advikkabra added 2 commits March 28, 2024 20:32

Faster string repeat algorithm

81b4810

Optimize _lfortran_strcpy

d76d2ea

Shaikh-Ubaid reviewed Mar 29, 2024

View reviewed changes

Shaikh-Ubaid approved these changes Apr 3, 2024

View reviewed changes

advikkabra added 2 commits April 3, 2024 12:25

Faster string repeat algorithm

7d2080c

Optimize _lfortran_strcpy

d6645f5

Shaikh-Ubaid force-pushed the optimize-repeat branch from d76d2ea to d6645f5 Compare April 3, 2024 06:55

Shaikh-Ubaid enabled auto-merge April 3, 2024 06:55

Shaikh-Ubaid disabled auto-merge April 3, 2024 06:56

advikkabra added 2 commits April 4, 2024 01:01

Added stress test for repeat

130ca48

Merge branch 'optimize-repeat' of https://github.com/advikkabra/lpython…

12622b1

… into optimize-repeat

Shaikh-Ubaid approved these changes Apr 4, 2024

View reviewed changes

Shaikh-Ubaid merged commit 4ceac47 into lcompilers:main Apr 4, 2024
13 checks passed

advikkabra deleted the optimize-repeat branch July 21, 2024 13:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize string repetition and string copying #2634

Optimize string repetition and string copying #2634

advikkabra commented Mar 28, 2024 •

edited

Loading

kmr-srbh commented Mar 28, 2024 •

edited

Loading

kmr-srbh commented Mar 28, 2024

advikkabra commented Mar 28, 2024

kmr-srbh commented Mar 28, 2024

Shaikh-Ubaid Mar 29, 2024

Shaikh-Ubaid Mar 29, 2024

advikkabra Mar 29, 2024 •

edited

Loading

Shaikh-Ubaid Mar 29, 2024

Shaikh-Ubaid left a comment

Shaikh-Ubaid commented Apr 3, 2024

Shaikh-Ubaid commented Apr 3, 2024

advikkabra commented Apr 3, 2024

Shaikh-Ubaid commented Apr 3, 2024

advikkabra commented Apr 3, 2024

Shaikh-Ubaid commented Apr 3, 2024 •

edited

Loading

Shaikh-Ubaid commented Apr 3, 2024 •

edited

Loading

advikkabra commented Apr 3, 2024

Shaikh-Ubaid left a comment

Optimize string repetition and string copying #2634

Optimize string repetition and string copying #2634

Conversation

advikkabra commented Mar 28, 2024 • edited Loading

kmr-srbh commented Mar 28, 2024 • edited Loading

N = 10 6

kmr-srbh commented Mar 28, 2024

advikkabra commented Mar 28, 2024

kmr-srbh commented Mar 28, 2024

Shaikh-Ubaid Mar 29, 2024

Choose a reason for hiding this comment

Shaikh-Ubaid Mar 29, 2024

Choose a reason for hiding this comment

advikkabra Mar 29, 2024 • edited Loading

Choose a reason for hiding this comment

Shaikh-Ubaid Mar 29, 2024

Choose a reason for hiding this comment

Shaikh-Ubaid left a comment

Choose a reason for hiding this comment

Shaikh-Ubaid commented Apr 3, 2024

Shaikh-Ubaid commented Apr 3, 2024

advikkabra commented Apr 3, 2024

Shaikh-Ubaid commented Apr 3, 2024

advikkabra commented Apr 3, 2024

Shaikh-Ubaid commented Apr 3, 2024 • edited Loading

Shaikh-Ubaid commented Apr 3, 2024 • edited Loading

advikkabra commented Apr 3, 2024

Shaikh-Ubaid left a comment

Choose a reason for hiding this comment

advikkabra commented Mar 28, 2024 •

edited

Loading

kmr-srbh commented Mar 28, 2024 •

edited

Loading

N = 10 ⁶

advikkabra Mar 29, 2024 •

edited

Loading

Shaikh-Ubaid commented Apr 3, 2024 •

edited

Loading

Shaikh-Ubaid commented Apr 3, 2024 •

edited

Loading