-
Notifications
You must be signed in to change notification settings - Fork 328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up EnsoInputStream again #10515
Conversation
What is the change that sped it up? |
First of all, I wanted to run the benchmarks to find out how bad we are. The results are coming. Now there is also a fix 0accf83 - with that one we are around 90%-110% of the original speed (before the regression). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested on a project with large csv (77MB) and it loaded almost instantaneously.
I still disagree with committing test file to our repo, even if there is an exact copy somewhere there (it shouldn't in the first place but that's another story). For this benchmark to be a bit more reliable I would prefer to have a much larger file either fetched from the internet or have a separate repo for such large test files.
Not going to block this PR because of that because it fixes a rather critical problem.
engine/runtime/src/main/java/org/enso/interpreter/runtime/data/EnsoFile.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question on a couple of methods
Given that here we are just testing loading, could we just generate such file? We can just create a big CSV of random texts, numbers etc. very easily using |
We could certainly generate a
I am not adding anything to the size of our Git repository. Git stores files in
Downloading only adds flakiness. Separate repo is an overkill. Why should we generate this file, when that file isn't generated? Is there any consistency in such a suggestion? |
IMHO we should proceed as is and 'fix' this later. As you said - it's the exact copy so it does not introduce additional overhead. Overall we should probably be generating both files.
I guess the difference is that we can randomly generate this file as we don't care about its exact contents, but only structure. For the other one we are checking aggregation computation results so we need to know the exact values. But we could probably also generate it with a deterministic seed so yeah you are right. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we revert the change that was disabling the use of default encoding?
I assumed that the problem with performance was when using Encoding.Default
, so to check if this fix really addresses the issue in full, we should probably run the benchmarks also with the patch:
diff --git a/distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Encoding.enso b/distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Encoding.enso
--- a/distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Encoding.enso (revision 8da06309e99f24d7062df5638f49e000fb292eff)
+++ b/distribution/lib/Standard/Base/0.0.0-dev/src/Data/Text/Encoding.enso (revision 71a6e2162e41653b7b239a756adb8bf14db2e8f4)
@@ -71,8 +71,7 @@
default -> Encoding =
# This factory method is used to publicly expose the `Default` constructor.
# The constructor itself has to be private, because we want to make `Value` constructor private, but all constructors must have the same privacy.
- # ToDo: This is a workaround for performance issue.
- Encoding.utf_8
+ Encoding.Default
## PRIVATE
A default encoding that will try to guess the encoding based on some heuristics.
Why?
No. The first bad merge is fe2cf49 coming from
I don't understand why we should be doing that and how? I don't want to change |
@jdunkerley mentioned he will re-enable in a separate PR. But it would be good to compare with this benchmark how it affects the performance w/ and w/o. |
Exactly. @jdunkerley's hotfix of disabling default encoding was said to improve performance already. So it also affected the performance of the CSV loading that is measured here. To be able to well measure the target performance, we should run the benchmarks in the target state - with this hotfix reverted. |
It's not unrelated. It was done in #10493 with comment "Disable Encoding.default for now as big performance impact on CSVs.". So it seems crucial to measure with benchmarks if, after we re-enable this encoding, whether the performance will be back to normal. |
OK, let's get this PR in and then you can run the measurements with and without |
Once this benchmarks run is over, we will have the initial results for the new |
Pull Request Description
Fix #10503 by creating a benchmark and then speeding it up by making sure usage of
InteropLibrary
reminds in partially evaluated code and isn't hidden behind@TruffleBoundary
.Checklist
Please ensure that the following checklist has been satisfied before submitting the PR:
Java,