[JOHNZON-404] don't rely on jdk internals for BoundedOutputStreamWriter #128

jungm · 2024-06-24T19:57:23Z

No description provided.

rmannibucau

A few questions:

why don't you need to handle overflow (can be a reason but it is surprising upfront)
the while loop can be done at once saving the loop time I think and I'd need to check but think the encoding as well
doesnt encoder already have a throw exception management?

guess we'll need a ton of tests or fallback on the outputstreamwriter othrrwise :(

jungm · 2024-06-24T20:29:12Z

The idea is basically: while there's chars left try to encode them into the buffer. If the encoding overflows (it is indeed handled) flush the buffer to the underlying OutputStream. I ignored underflows because imo underflowing (no input data left) is just the desired state we want to end up in

To me javadocs don't sound like CharsetEncoder throws when encoding, hence I added the throwException() call

Agree we probably need more tests, this is really just a first naive approach to move away from sun StreamEncoder

rmannibucau · 2024-06-25T06:05:05Z

@jungm I understand that part but we are far from the implementation on java < 22 so pretty sure we'll find not backward compatible cases. What can be worth it is to bench oututstreamwriter+bufferedwriter on java 22 vs our impl on java 21, if not diff we just extend it for backward compat?

jungm · 2024-06-25T13:50:56Z

is BufferedWriter + OutputStreamWriter even an option? adding a buffer before delegating to OutputStreamWriter will not eliminate the internal buffer in OutputStreamWriter

anyways, some quick benching shows that this change slows down things quite a bit :/ (ran on JDK 11)

current BoundedOutputStreamWriter:

Benchmark    Mode  Cnt   Score   Error   Units
Main.bench  thrpt    3  35.772 ± 1.922  ops/us

New BoundedOutputStreamWriter

Benchmark    Mode  Cnt   Score   Error   Units
Main.bench  thrpt    3  22.928 ± 2.732  ops/us

For reference, this is the jmh benchmark:

@BenchmarkMode(Mode.Throughput)
@Timeout(time = 10)
@Warmup(iterations = 3)
@State(Scope.Benchmark)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
public class Main {
    private Writer writer;
    private char[] data;

    @Setup
    public void setUp() {
        writer = new BoundedOutputStreamWriter(OutputStream.nullOutputStream(), StandardCharsets.UTF_8, 100);
        data = "Hello world!".toCharArray();
    }

    @Benchmark
    public void bench() throws IOException {
        writer.write(data);
    }
}

rmannibucau · 2024-06-25T14:02:43Z

what we don't want is to have an "exploding" bugger, the internal buffer of OutputStreamWriter is acceptable (even if I agree undesired) but the facading buffer kind of saves things.

alternatively we can do an impl close to the old one of the jdk to just stay ~the same, wdyt?

jungm · 2024-06-26T15:49:53Z

Now you got me confused 😄
Isn't the whole point of BoundedOutputStreamWriter to have an OutputStreamWriter where we can control the internal buffer size to kind of force it to flush more often? Can't really see right now how the buffer even in jdk 22 StreamEncoder could explode, It starts at 512 bytes and grows to 8kb max (https://github.com/openjdk/jdk22u/blob/master/src/java.base/share/classes/sun/nio/cs/StreamEncoder.java#L44-L45), only thing I could find troublesome is the buffer being reallocated every time the StreamEncoder encounters a chunk of data it can't handle yet.
Is that what you mean by exploding? Reallocating the buffer until It reaches 8k?

OpenJDK 21 StreamEncoder is actually not that big but I'm afraid we can't really use this in the ASF as its GPLv2 licensed: https://www.apache.org/legal/resolved.html#category-x
So realistically we can't just take it but need to rewrite it and test it well, which is basically the situation we're already in right now

Maybe we can really just drop BoundedOutputStreamWriter altogether?

jungm · 2024-06-26T18:21:10Z

Did some more benchmarking (all ran on JDK 21 this time):

Benchmark                   Mode  Cnt   Score    Error   Units
MyBenchmark.buffered       thrpt   25  11.728 ±  0.042  ops/us
MyBenchmark.bufferedHuge   thrpt   25   0.277 ±  0.004  ops/us
MyBenchmark.bufferedSmall  thrpt   25  54.672 ±  0.328  ops/us
MyBenchmark.johnzon        thrpt   25   2.669 ±  0.022  ops/us
MyBenchmark.johnzonHuge    thrpt   25   0.029 ±  0.001  ops/us
MyBenchmark.johnzonSmall   thrpt   25  55.897 ±  0.188  ops/us
MyBenchmark.raw            thrpt   25  20.570 ±  0.431  ops/us
MyBenchmark.rawHuge        thrpt   25   0.278 ±  0.005  ops/us
MyBenchmark.rawSmall       thrpt   25  73.361 ±  0.273  ops/us

@BenchmarkMode(Mode.Throughput)
@Timeout(time = 10)
@Warmup(iterations = 3)
@State(Scope.Benchmark)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
public class MyBenchmark {
    private Writer johnzon;
    private Writer buffered;
    private Writer raw;

    private char[] dataSmall;
    private char[] data;
    private char[] dataHuge;

    @Setup
    public void setUp() {
        johnzon = new BoundedOutputStreamWriter(OutputStream.nullOutputStream(), StandardCharsets.UTF_8, 100);
        buffered = new BufferedWriter(new OutputStreamWriter(OutputStream.nullOutputStream(), StandardCharsets.UTF_8), 100);
        raw = new OutputStreamWriter(OutputStream.nullOutputStream(), StandardCharsets.UTF_8);

        dataSmall = "Hello World".toCharArray();
        data = "Hello world".repeat(100).toCharArray();
        dataHuge = "Hello world".repeat(10000).toCharArray();
    }

    @Benchmark
    public void johnzonSmall() throws IOException {
        johnzon.write(dataSmall);
    }

    @Benchmark
    public void johnzon() throws IOException {
        johnzon.write(data);
    }

    @Benchmark
    public void johnzonHuge() throws IOException {
        johnzon.write(dataHuge);
    }

    @Benchmark
    public void bufferedSmall() throws IOException {
        buffered.write(dataSmall);
    }

    @Benchmark
    public void buffered() throws IOException {
        buffered.write(data);
    }

    @Benchmark
    public void bufferedHuge() throws IOException {
        buffered.write(dataHuge);
    }

    @Benchmark
    public void rawSmall() throws IOException {
        raw.write(dataSmall);
    }

    @Benchmark
    public void raw() throws IOException {
        raw.write(data);
    }

    @Benchmark
    public void rawHuge() throws IOException {
        raw.write(dataHuge);
    }
}

BufferedWriter + OutputStreamWriter is almost always faster than BoundedOutputStreamWriter

rmannibucau · 2024-06-26T19:21:46Z

Interesting.

You are totally right on the use case (basically whatever we do the tests should still pass without modifying them).

High level the use case is when you open a writer - typically from a server, http/ssh/... - and want to ensure some flushing happens often to avoid to hang if messages are small.

So thanks to your bench I think that mixing BufferedWriter + OutputStreamWriter with an autoflush (counting chars I guess?) would be insanely sufficient.

jungm · 2024-06-27T06:23:23Z

For good measure, another benchmark with the new auto flushing BufferedWriter:

Benchmark                  Mode  Cnt   Score   Error   Units
MyBenchmark.johnzon       thrpt   25  12.111 ± 0.052  ops/us
MyBenchmark.johnzonHuge   thrpt   25   0.275 ± 0.005  ops/us
MyBenchmark.johnzonSmall  thrpt   25  54.425 ± 0.066  ops/us

No performance lost and does what we want 😉

rmannibucau · 2024-06-27T07:10:04Z

@jungm I'd keep the old name - if you want to promote the new one please keep the code like that and just keep the old class and make it inheriting the new one 100% for backward compat (ok if the class is no more used from the core code).

Just a small question - but overall it is a very satisfying fix from my window: why autoFlush() is called before writing and counting rather than after (write();autoFlush();)?

jungm · 2024-06-27T11:58:10Z

Really the only reason I flush before writing is to not break/alter the behavior we had in BoundedOutputStream before, BoundedOutputStreamWriterTest expects the data that didn't fit in the buffer in the last write operation to stay in the buffer and not be flushed immediately

…and keep its signature

…er (#128) (cherry picked from commit 89492c9)

[JOHNZON-404] don't rely on jdk internals for BoundedOutputStreamWriter

6007207

rmannibucau reviewed Jun 24, 2024

View reviewed changes

Replace BoundedOutputStreamWriter with an auto flushing Buffered Writer

3b4de6d

jungm requested a review from rmannibucau June 27, 2024 06:23

Rename AutoFlushingBufferedWriter to BoundedOutputStreamWriter again …

508f5db

…and keep its signature

rmannibucau approved these changes Jun 27, 2024

View reviewed changes

jungm merged commit 89492c9 into apache:master Jun 27, 2024
4 checks passed

jungm deleted the johnzon-404 branch June 27, 2024 12:50

jungm added a commit that referenced this pull request Aug 4, 2024

[JOHNZON-404] don't rely on jdk internals for BoundedOutputStreamWrit…

a0e5ffd

…er (#128) (cherry picked from commit 89492c9)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JOHNZON-404] don't rely on jdk internals for BoundedOutputStreamWriter #128

[JOHNZON-404] don't rely on jdk internals for BoundedOutputStreamWriter #128

jungm commented Jun 24, 2024

rmannibucau left a comment

jungm commented Jun 24, 2024

rmannibucau commented Jun 25, 2024

jungm commented Jun 25, 2024

rmannibucau commented Jun 25, 2024

jungm commented Jun 26, 2024

jungm commented Jun 26, 2024

rmannibucau commented Jun 26, 2024 •

edited

Loading

jungm commented Jun 27, 2024

rmannibucau commented Jun 27, 2024

jungm commented Jun 27, 2024

[JOHNZON-404] don't rely on jdk internals for BoundedOutputStreamWriter #128

[JOHNZON-404] don't rely on jdk internals for BoundedOutputStreamWriter #128

Conversation

jungm commented Jun 24, 2024

rmannibucau left a comment

Choose a reason for hiding this comment

jungm commented Jun 24, 2024

rmannibucau commented Jun 25, 2024

jungm commented Jun 25, 2024

rmannibucau commented Jun 25, 2024

jungm commented Jun 26, 2024

jungm commented Jun 26, 2024

rmannibucau commented Jun 26, 2024 • edited Loading

jungm commented Jun 27, 2024

rmannibucau commented Jun 27, 2024

jungm commented Jun 27, 2024

rmannibucau commented Jun 26, 2024 •

edited

Loading