-
Notifications
You must be signed in to change notification settings - Fork 672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[IO-670] refine IOUtils.contentEqualsIgnoreEOL(Reader, Reader) #118
base: master
Are you sure you want to change the base?
Conversation
need I make a performance test case to show the performance? |
I also want to change other contentEquals functions in IOUtils (using similar way), but I want to listen to your advices first. |
|
OK
I have only experiences using Jprofiler and I don't know how to embed a test with performance in junit. also, need I create more tests for testing wether this function is correct? |
@garydgregory |
@garydgregory BTW, how can I show the performance result generated? |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@garydgregory |
OK, according to |
Just FYI, your expectations for 72 hour consensus might be slightly off, it usually applies to certain kinds of votes, not parts of PRs. |
@garydgregory |
125d490
to
66ffa50
Compare
src/test/java/org/apache/commons/io/performance/IOUtilsContentEqualsPerformanceTest.java
Outdated
Show resolved
Hide resolved
The code needs comments. In the case of contentEquals(InputStream...) and contentEquals(Reader ...) it would be enough to comment one and refer the reader to the other for details as they are relatively short and use exactly the same logic AFAICT. I don't suppose there is any way to share the code? The other method is now enormous, and really ought to be refactored if possible. Also what does -1 mean? I assume it is the EOF marker, in which case why not use the EOF constant? |
Hi.
Considering about FileReader itself is a time coster(and too slow that might hide the contentEquals's runing time), so I mainly use StringReader 's result for showing performance gain. |
This comment has been minimized.
This comment has been minimized.
It occurs to me that the code is effectively buffering the output from a BufferedReader (or BufferedInputStream). Rather than implement the buffering directly in the compare methods, it might be better to implement a generic, non-threadsafe buffered reader/stream that can be used in situations such as these. [No need to wrap the original input in a buffered version first]. It would be a non-threadsafe version of BufferedReader/InputStream. The original code should then work without any change other than to add the filter. |
Yes, this is the main trick.
I used the filter idea you mentioned and re-write another implementation, named contentEqualsIgnoreEOLNew2. And I worried if we really split it out to class, it may become even slower. But, I do agree with your idea, a non-thread-safe-but-faster-BufferedReader is valuable in many cases. |
will re-run the performance test and see how it can get after 10s warm-up. |
It looks to me like contentEqualsIgnoreEOLNew2 is using InputReader - I cannot see any buffering? |
it is embeded. |
I see. I would be clearer to use a separate class to implement this. However the method inputOnlyHaveCRLForEOF uses read() on an unbuffered input class. Note that IO has a CircularBufferInputStream which may be suitable for at least one of the cases. Should be easy enough to implement the Reader equivalent. |
Hi. Actually will redo it anyway, as I thought it good to split it out to some class structure, rather than repeat two times in the function. |
This comment has been minimized.
This comment has been minimized.
Hi.
I'm amazed that ContentEqualsIgnoreEOLNew2 runs even faster after the simplify. @sebbASF |
Do you have any proof that using UnsyncBufferedReader etc in contentEquals will result in a slow-down unless further changes are made to contentEquals? If so, what further changes are needed to contentEquals, and what is the speed improvement? As to any renaming, that can be done later. |
Phone keyboard! Right, today we have:
|
Yes, I ran a rough performance test on a same pc, and the result be time 9e8.(the current version of this pr shows 4.6e8, and master 2.8e9)
That trick only be possible when we comparing two(or several) InputReaders. |
Hi. |
I think would like to/will soon-ish bring in the underlying |
@garydgregory got it. Thanks. |
@garydgregory Weekend now. Any news?:) |
ping? :) |
I will look over all Commons PRs starting tomorrow over the next while I have some time off from work... please be patient. |
The Java folks make it sounds like these are superfluous in https://bugs.openjdk.java.net/browse/JDK-4097272, so I think we need to see a performance test that shows there is a clear performance benefit to adding those as valuable on their own. Perhaps our addition of |
@garydgregory |
Hello @XenoAmess But still, let's continue this thread. Starting with the lowest-level bits: we need to justify the addition of the misnamed We need performances test that show the differences, if any, between the JRE's classes and our proposed I think you should create a new PR for just these two new classes and their tests. This will make the work simpler for everyone when reviewing and testing. TY. |
@garydgregory Hi. I done the performance test at #184. |
a247cfc
to
9540a1f
Compare
@garydgregory unsynced buffered classes deleted. |
This is based on the PR #118 by XenoAmess but only for this one method.
This is based on the PR #118 by XenoAmess but only for this one method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @XenoAmess
Please see git master and rebase. I updated with your changes for InputStream and Reader but not for the EOL APIs. If you still want those in, please rebase and follow the pattern I established with the *Benchmark classes.
TY! -)
PS: I gave you credit in the commit message and changes.xml 👍
620bc26
to
1f7750a
Compare
rebased. benchmark redone. |
1f7750a
to
eb25f9a
Compare
eb25f9a
to
ccdce2a
Compare
ccdce2a
to
b18e0ee
Compare
@garydgregory rebased. now passed new checkstyle. |
refine IOUtils.contentEquals(Reader, Reader) from simply wrapped by BufferedReader to something better.