Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JOHNZON-404] don't rely on jdk internals for BoundedOutputStreamWriter #128

Merged
merged 3 commits into from
Jun 27, 2024

Conversation

jungm
Copy link
Member

@jungm jungm commented Jun 24, 2024

No description provided.

Copy link
Contributor

@rmannibucau rmannibucau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few questions:

  1. why don't you need to handle overflow (can be a reason but it is surprising upfront)
  2. the while loop can be done at once saving the loop time I think and I'd need to check but think the encoding as well
  3. doesnt encoder already have a throw exception management?

guess we'll need a ton of tests or fallback on the outputstreamwriter othrrwise :(

@jungm
Copy link
Member Author

jungm commented Jun 24, 2024

The idea is basically: while there's chars left try to encode them into the buffer. If the encoding overflows (it is indeed handled) flush the buffer to the underlying OutputStream. I ignored underflows because imo underflowing (no input data left) is just the desired state we want to end up in

To me javadocs don't sound like CharsetEncoder throws when encoding, hence I added the throwException() call

Agree we probably need more tests, this is really just a first naive approach to move away from sun StreamEncoder

@rmannibucau
Copy link
Contributor

@jungm I understand that part but we are far from the implementation on java < 22 so pretty sure we'll find not backward compatible cases. What can be worth it is to bench oututstreamwriter+bufferedwriter on java 22 vs our impl on java 21, if not diff we just extend it for backward compat?

@jungm
Copy link
Member Author

jungm commented Jun 25, 2024

is BufferedWriter + OutputStreamWriter even an option? adding a buffer before delegating to OutputStreamWriter will not eliminate the internal buffer in OutputStreamWriter

anyways, some quick benching shows that this change slows down things quite a bit :/ (ran on JDK 11)

current BoundedOutputStreamWriter:

Benchmark    Mode  Cnt   Score   Error   Units
Main.bench  thrpt    3  35.772 ± 1.922  ops/us

New BoundedOutputStreamWriter

Benchmark    Mode  Cnt   Score   Error   Units
Main.bench  thrpt    3  22.928 ± 2.732  ops/us

For reference, this is the jmh benchmark:

@BenchmarkMode(Mode.Throughput)
@Timeout(time = 10)
@Warmup(iterations = 3)
@State(Scope.Benchmark)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
public class Main {
    private Writer writer;
    private char[] data;

    @Setup
    public void setUp() {
        writer = new BoundedOutputStreamWriter(OutputStream.nullOutputStream(), StandardCharsets.UTF_8, 100);
        data = "Hello world!".toCharArray();
    }

    @Benchmark
    public void bench() throws IOException {
        writer.write(data);
    }
}

@rmannibucau
Copy link
Contributor

what we don't want is to have an "exploding" bugger, the internal buffer of OutputStreamWriter is acceptable (even if I agree undesired) but the facading buffer kind of saves things.

alternatively we can do an impl close to the old one of the jdk to just stay ~the same, wdyt?

@jungm
Copy link
Member Author

jungm commented Jun 26, 2024

Now you got me confused 😄
Isn't the whole point of BoundedOutputStreamWriter to have an OutputStreamWriter where we can control the internal buffer size to kind of force it to flush more often? Can't really see right now how the buffer even in jdk 22 StreamEncoder could explode, It starts at 512 bytes and grows to 8kb max (https://github.com/openjdk/jdk22u/blob/master/src/java.base/share/classes/sun/nio/cs/StreamEncoder.java#L44-L45), only thing I could find troublesome is the buffer being reallocated every time the StreamEncoder encounters a chunk of data it can't handle yet.
Is that what you mean by exploding? Reallocating the buffer until It reaches 8k?

OpenJDK 21 StreamEncoder is actually not that big but I'm afraid we can't really use this in the ASF as its GPLv2 licensed: https://www.apache.org/legal/resolved.html#category-x
So realistically we can't just take it but need to rewrite it and test it well, which is basically the situation we're already in right now

Maybe we can really just drop BoundedOutputStreamWriter altogether?

@jungm
Copy link
Member Author

jungm commented Jun 26, 2024

Did some more benchmarking (all ran on JDK 21 this time):

Benchmark                   Mode  Cnt   Score    Error   Units
MyBenchmark.buffered       thrpt   25  11.728 ±  0.042  ops/us
MyBenchmark.bufferedHuge   thrpt   25   0.277 ±  0.004  ops/us
MyBenchmark.bufferedSmall  thrpt   25  54.672 ±  0.328  ops/us
MyBenchmark.johnzon        thrpt   25   2.669 ±  0.022  ops/us
MyBenchmark.johnzonHuge    thrpt   25   0.029 ±  0.001  ops/us
MyBenchmark.johnzonSmall   thrpt   25  55.897 ±  0.188  ops/us
MyBenchmark.raw            thrpt   25  20.570 ±  0.431  ops/us
MyBenchmark.rawHuge        thrpt   25   0.278 ±  0.005  ops/us
MyBenchmark.rawSmall       thrpt   25  73.361 ±  0.273  ops/us
@BenchmarkMode(Mode.Throughput)
@Timeout(time = 10)
@Warmup(iterations = 3)
@State(Scope.Benchmark)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
public class MyBenchmark {
    private Writer johnzon;
    private Writer buffered;
    private Writer raw;

    private char[] dataSmall;
    private char[] data;
    private char[] dataHuge;

    @Setup
    public void setUp() {
        johnzon = new BoundedOutputStreamWriter(OutputStream.nullOutputStream(), StandardCharsets.UTF_8, 100);
        buffered = new BufferedWriter(new OutputStreamWriter(OutputStream.nullOutputStream(), StandardCharsets.UTF_8), 100);
        raw = new OutputStreamWriter(OutputStream.nullOutputStream(), StandardCharsets.UTF_8);

        dataSmall = "Hello World".toCharArray();
        data = "Hello world".repeat(100).toCharArray();
        dataHuge = "Hello world".repeat(10000).toCharArray();
    }

    @Benchmark
    public void johnzonSmall() throws IOException {
        johnzon.write(dataSmall);
    }

    @Benchmark
    public void johnzon() throws IOException {
        johnzon.write(data);
    }

    @Benchmark
    public void johnzonHuge() throws IOException {
        johnzon.write(dataHuge);
    }

    @Benchmark
    public void bufferedSmall() throws IOException {
        buffered.write(dataSmall);
    }

    @Benchmark
    public void buffered() throws IOException {
        buffered.write(data);
    }

    @Benchmark
    public void bufferedHuge() throws IOException {
        buffered.write(dataHuge);
    }

    @Benchmark
    public void rawSmall() throws IOException {
        raw.write(dataSmall);
    }

    @Benchmark
    public void raw() throws IOException {
        raw.write(data);
    }

    @Benchmark
    public void rawHuge() throws IOException {
        raw.write(dataHuge);
    }
}

BufferedWriter + OutputStreamWriter is almost always faster than BoundedOutputStreamWriter

@rmannibucau
Copy link
Contributor

rmannibucau commented Jun 26, 2024

Interesting.

You are totally right on the use case (basically whatever we do the tests should still pass without modifying them).

High level the use case is when you open a writer - typically from a server, http/ssh/... - and want to ensure some flushing happens often to avoid to hang if messages are small.

So thanks to your bench I think that mixing BufferedWriter + OutputStreamWriter with an autoflush (counting chars I guess?) would be insanely sufficient.

@jungm
Copy link
Member Author

jungm commented Jun 27, 2024

For good measure, another benchmark with the new auto flushing BufferedWriter:

Benchmark                  Mode  Cnt   Score   Error   Units
MyBenchmark.johnzon       thrpt   25  12.111 ± 0.052  ops/us
MyBenchmark.johnzonHuge   thrpt   25   0.275 ± 0.005  ops/us
MyBenchmark.johnzonSmall  thrpt   25  54.425 ± 0.066  ops/us

No performance lost and does what we want 😉

@jungm jungm requested a review from rmannibucau June 27, 2024 06:23
@rmannibucau
Copy link
Contributor

@jungm I'd keep the old name - if you want to promote the new one please keep the code like that and just keep the old class and make it inheriting the new one 100% for backward compat (ok if the class is no more used from the core code).

Just a small question - but overall it is a very satisfying fix from my window: why autoFlush() is called before writing and counting rather than after (write();autoFlush();)?

@jungm
Copy link
Member Author

jungm commented Jun 27, 2024

Really the only reason I flush before writing is to not break/alter the behavior we had in BoundedOutputStream before, BoundedOutputStreamWriterTest expects the data that didn't fit in the buffer in the last write operation to stay in the buffer and not be flushed immediately

@jungm jungm merged commit 89492c9 into apache:master Jun 27, 2024
4 checks passed
@jungm jungm deleted the johnzon-404 branch June 27, 2024 12:50
jungm added a commit that referenced this pull request Aug 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants