-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JOHNZON-404] don't rely on jdk internals for BoundedOutputStreamWriter #128
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions:
- why don't you need to handle overflow (can be a reason but it is surprising upfront)
- the while loop can be done at once saving the loop time I think and I'd need to check but think the encoding as well
- doesnt encoder already have a throw exception management?
guess we'll need a ton of tests or fallback on the outputstreamwriter othrrwise :(
The idea is basically: while there's chars left try to encode them into the buffer. If the encoding overflows (it is indeed handled) flush the buffer to the underlying OutputStream. I ignored underflows because imo underflowing (no input data left) is just the desired state we want to end up in To me javadocs don't sound like CharsetEncoder throws when encoding, hence I added the throwException() call Agree we probably need more tests, this is really just a first naive approach to move away from sun StreamEncoder |
@jungm I understand that part but we are far from the implementation on java < 22 so pretty sure we'll find not backward compatible cases. What can be worth it is to bench oututstreamwriter+bufferedwriter on java 22 vs our impl on java 21, if not diff we just extend it for backward compat? |
is BufferedWriter + OutputStreamWriter even an option? adding a buffer before delegating to OutputStreamWriter will not eliminate the internal buffer in OutputStreamWriter anyways, some quick benching shows that this change slows down things quite a bit :/ (ran on JDK 11) current BoundedOutputStreamWriter:
New BoundedOutputStreamWriter
For reference, this is the jmh benchmark: @BenchmarkMode(Mode.Throughput)
@Timeout(time = 10)
@Warmup(iterations = 3)
@State(Scope.Benchmark)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
public class Main {
private Writer writer;
private char[] data;
@Setup
public void setUp() {
writer = new BoundedOutputStreamWriter(OutputStream.nullOutputStream(), StandardCharsets.UTF_8, 100);
data = "Hello world!".toCharArray();
}
@Benchmark
public void bench() throws IOException {
writer.write(data);
}
} |
what we don't want is to have an "exploding" bugger, the internal buffer of alternatively we can do an impl close to the old one of the jdk to just stay ~the same, wdyt? |
Now you got me confused 😄 OpenJDK 21 StreamEncoder is actually not that big but I'm afraid we can't really use this in the ASF as its GPLv2 licensed: https://www.apache.org/legal/resolved.html#category-x Maybe we can really just drop BoundedOutputStreamWriter altogether? |
Did some more benchmarking (all ran on JDK 21 this time):
@BenchmarkMode(Mode.Throughput)
@Timeout(time = 10)
@Warmup(iterations = 3)
@State(Scope.Benchmark)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
public class MyBenchmark {
private Writer johnzon;
private Writer buffered;
private Writer raw;
private char[] dataSmall;
private char[] data;
private char[] dataHuge;
@Setup
public void setUp() {
johnzon = new BoundedOutputStreamWriter(OutputStream.nullOutputStream(), StandardCharsets.UTF_8, 100);
buffered = new BufferedWriter(new OutputStreamWriter(OutputStream.nullOutputStream(), StandardCharsets.UTF_8), 100);
raw = new OutputStreamWriter(OutputStream.nullOutputStream(), StandardCharsets.UTF_8);
dataSmall = "Hello World".toCharArray();
data = "Hello world".repeat(100).toCharArray();
dataHuge = "Hello world".repeat(10000).toCharArray();
}
@Benchmark
public void johnzonSmall() throws IOException {
johnzon.write(dataSmall);
}
@Benchmark
public void johnzon() throws IOException {
johnzon.write(data);
}
@Benchmark
public void johnzonHuge() throws IOException {
johnzon.write(dataHuge);
}
@Benchmark
public void bufferedSmall() throws IOException {
buffered.write(dataSmall);
}
@Benchmark
public void buffered() throws IOException {
buffered.write(data);
}
@Benchmark
public void bufferedHuge() throws IOException {
buffered.write(dataHuge);
}
@Benchmark
public void rawSmall() throws IOException {
raw.write(dataSmall);
}
@Benchmark
public void raw() throws IOException {
raw.write(data);
}
@Benchmark
public void rawHuge() throws IOException {
raw.write(dataHuge);
}
} BufferedWriter + OutputStreamWriter is almost always faster than BoundedOutputStreamWriter |
Interesting. You are totally right on the use case (basically whatever we do the tests should still pass without modifying them). High level the use case is when you open a writer - typically from a server, http/ssh/... - and want to ensure some flushing happens often to avoid to hang if messages are small. So thanks to your bench I think that mixing BufferedWriter + OutputStreamWriter with an autoflush (counting chars I guess?) would be insanely sufficient. |
For good measure, another benchmark with the new auto flushing BufferedWriter:
No performance lost and does what we want 😉 |
@jungm I'd keep the old name - if you want to promote the new one please keep the code like that and just keep the old class and make it inheriting the new one 100% for backward compat (ok if the class is no more used from the core code). Just a small question - but overall it is a very satisfying fix from my window: why autoFlush() is called before writing and counting rather than after ( |
Really the only reason I flush before writing is to not break/alter the behavior we had in BoundedOutputStream before, BoundedOutputStreamWriterTest expects the data that didn't fit in the buffer in the last write operation to stay in the buffer and not be flushed immediately |
…and keep its signature
No description provided.