-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize string repetition and string copying #2634
Conversation
Advik, I recognize and appreciate the effort you have put in to find and resolve this issue. I fear, we have increased the time instead. N = 10 6string1: str = "a"
string2: str = string1 * 10**6 (base) saurabh-kumar@Awadh:~/Projects/System/lpython$ time ./src/bin/lpython ./examples/example.py
^C
real 1m43.633s
user 1m43.501s
sys 0m0.048s The operation took around 30s before. I request you to look into it. |
I think you might have mistakenly tested the above using CPython. |
It works fine on my machine, and after profiling more than 99% of the time used was in the advik@advik-Inspiron-7560:~/dev/oss/lpython$ time src/bin/lpython ../test.py
1000000
real 0m0.082s
user 0m0.058s
sys 0m0.020s
advik@advik-Inspiron-7560:~/dev/oss/lpython$ cat ../test.py
string1: str = "a";
string2: str = string1 * 10**6
print(len(string2)) |
Great work Advik! It works properly on my machine now too! That's an outright 300x improvement! 👏 (lp) saurabh-kumar@Awadh:~/Projects/System/lpython$ time ./src/bin/lpython ./examples/example.py
real 0m0.110s
user 0m0.071s
sys 0m0.039s The reason it did not work the previous time was because I was on an older branch. The new changes from LFortran have done an amazing job with increasing the speed. |
if (s_len == 1) { | ||
memset(dest_char, *(*s), n); | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why we need to handle the case for s_len == 1
separately. The else portion seems general, so I think that should be enough to handle the case of s_len == 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the else portion fail when s_len == 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not fail when s_len == 1
, but memset
is very optimized for this usecase. Setting a large block of data to a single value is what memset
excels in, and it is optimized for every architecture. While testing, these are the compared times:
100000000
Binary exponentiation: 0.086963
memset: 0.006341
400000000
Binary exponentiation: 0.211737
memset: 0.018612
This is after compiling with -O3
x[0][i] = y[i]; | ||
} | ||
for (; i < x_len; i++) { | ||
x[0][i] = ' '; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great improvement. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add a test for this with a large n
(so that we are sure that the time taken does not break in future). We can do it in a separate PR.
It looks good to me. Thanks for this. Great work!
d76d2ea
to
d6645f5
Compare
Did you trying checking this if the |
Let's merge this after we are sure that the |
This test is already there in def test_str_repeat():
a: str
a = "Xyz"
assert a*3 == "XyzXyzXyz"
assert a*2*3 == "XyzXyzXyzXyzXyzXyz"
assert 3*a*3 == "XyzXyzXyzXyzXyzXyzXyzXyzXyz"
assert a*-1 == "" |
Do we have a test with a large
Originally posted by @Shaikh-Ubaid in #2634 (review) |
How would you suggest I add this as a test? Is there a way to check if something is taking too long? |
I think not yet in LPython (for LFortran we have Previously (as of the If we add a test for |
For a robust test, we should support something like Once we have support for |
I have added a test for a 10**6 repetition. This would take >1 minute in the previous code, so it would be easy to catch if it goes wrong. Right now it completes it in 20ms on my machine, so it is comfortable with the new implementation. I think the PR is ready now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perfect! It looks good to me. Thanks for this. Great work!
* Faster string repeat algorithm * Optimize `_lfortran_strcpy` * Faster string repeat algorithm * Optimize `_lfortran_strcpy` * Added stress test for repeat
* Faster string repeat algorithm * Optimize `_lfortran_strcpy` * Faster string repeat algorithm * Optimize `_lfortran_strcpy` * Added stress test for repeat
Fixes #2616
memset
could be used for single char strings, and binary exponentiation could be used for multi char strings. This is faster than copying one byte at a time._lfortran_strcpy
was callingstrlen
in every loop iteration, which was slowing down the program significantly.After this change, a 1e6 repetition as shown in the issue is 0.04s, which is comparable to CPython.