[FLINK-36979][rpc] Reverting pekko version bump in Flink 1.20 #25866

XComp · 2024-12-29T13:51:08Z

This reverts commit 4776c96.

What is the purpose of the change

Reverts the pekko version bump that includes an upgrade to netty 4.x. Corresponding discussion happened in FLINK-36510.

Brief change log

Plain revert

Verifying this change

no additional verification done aside from CI

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): yes
The public API, i.e., is any changed class annotated with @Public(Evolving): no
The serializers: no
The runtime per-record code paths (performance sensitive): no
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: yes
The S3 file system connector: no

Documentation

Does this pull request introduce a new feature? no
If yes, how is the feature documented? not applicable

This reverts commit 4776c96.

flinkbot · 2024-12-29T13:55:38Z

CI report:

e7d2bb2 Azure: SUCCESS

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run azure re-run the last Azure build

He-Pin · 2024-12-31T03:09:27Z

@ferenc-csaky @XComp Hi, I think I can confirm that there's not a leak on the Pekko side.

He-Pin · 2024-12-31T08:03:03Z

I would like to suggest:

rerun the tests with -Dio.netty.tryReflectionSetAccessible=true and -Dio.netty.leakDetection.level=PARANOID to see why and where it leaks or gets a heap dump.

We have some applications, not a flink application still needs this -Dio.netty.tryReflectionSetAccessible=true to avoid OOM in production.

or , you could set the Netty's bytebuf allocator with unpooled by default.

with -Dio.netty.allocator.type=unpooled

XComp · 2025-01-03T11:30:28Z

@He-Pin are you sure? We see the following stacktrace in this e2e test failure:

Jan 02 06:20:02 org.apache.flink.runtime.io.network.netty.exception.LocalTransportException: Cannot reserve 4194304 bytes of direct buffer memory (allocated: 140396831, limit: 141557760) (connection to 'localhost/127.0.0.1:42031 [localhost:45071-b0167d]')
Jan 02 06:20:02 	at org.apache.flink.runtime.io.network.netty.CreditBasedPartitionRequestClientHandler.exceptionCaught(CreditBasedPartitionRequestClientHandler.java:175) ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]
Jan 02 06:20:02 	at org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:346) ~[flink-dist-1.20-SNAPSHOT.jar:1.20-SNAPSHOT]

Anyway, I might have another look into it on the Flink side as well.

He-Pin · 2025-01-03T11:58:48Z

@XComp So it is better to turn on -Dio.netty.leakDetection.level=PARANOID to see who holds the buffers.
and turn on -Dio.netty.allocator.type=unpooled too, then all ByteBuf lives in memory.

@normanmaurer is there any suggestion, thanks.

He-Pin · 2025-01-03T12:57:09Z

another optimization in apache/pekko#1667

He-Pin · 2025-01-06T06:06:58Z

@XComp Is there any update can share, thanks.

XComp · 2025-01-06T12:01:02Z

@XComp Is there any update can share, thanks.

Currently, I have a lot of other stuff on my plate. I wouldn't mind if somebody else could help pick this up.

He-Pin · 2025-01-06T12:06:23Z

It would be nice if anyone could add :
-Dio.netty.tryReflectionSetAccessible=true
-Dio.netty.leakDetection.level=PARANOID
-Dio.netty.allocator.type=unpooled

to run tests.

ferenc-csaky · 2025-01-06T12:19:20Z

Hi! I'm back from my holiday, so I can take it and try some runs, will update the Jira ticket with any progress.

davidradl · 2025-01-09T15:25:05Z

@ferenc-csaky @He-Pin @XComp It looks like this PR is all about reverting the level of the pekko until we understand the cause of the OOM. All the comments seem to relate to resolving the OOM. Can I suggest we merge this revert, and investigate the OOM separately as suggested in the Jira.

He-Pin · 2025-01-09T16:06:39Z

This is not a leak but a user-side error. This is how Netty works. Will you set it to 7MB in production, @davidradl?
The best Pekko can do is :

add an option to support using UnpooledAllocator to make the 7MB memory tests pass, is that the right way to go?

If you think that's really needed, please send a PR to pekko, which can setup the allocator type of channels.

ferenc-csaky · 2025-01-09T17:33:35Z

@davidradl My understanding is that Netty4 does not leak memory, simply compared to Netty3 by default it does not work the same way and reserve a bit more memory.

But with -Dio.netty.tryReflectionSetAccessible=true the default memory footprint will be smaller, and -Dio.netty.allocator.type=unpooled can also help with that, but my understanding is that performance wise it is not really advised to use unpooled allocation.

@He-Pin 7MB is not realistic in any kind of production use-case, for the failing test it is only set that way, because that test validates how much memory is used by Netty, that's why it sets num-arenas to 1 and mem to 7MB.

My suggestion would be to fix this test instead of revert. Either by giving it more memory, or providing the necessary Netty configs to be able to function with that much memory.

For the sake of completeness, on master commit 338d024 already increased the memory to 90MB, so backporting that commit to 1.20 and 1.19 could fix that test case mentioned here earlier. Although I'm not sure with those changes that test will be meaningful, because num-arenas=1 is also removed in that commit.

My original idea was to set -Dio.netty.tryReflectionSetAccessible=true for this test case, because that way, it fits into 7MB and test execution succeeded multiple times on my local tests for both JDK11 and JDK17.

He-Pin · 2025-01-09T17:41:19Z

@ferenc-csaky Our Java applications (high throughput) run with Java 11 /21 are using -Dio.netty.tryReflectionSetAccessible=true when we upgrade From Java 8, this is really needed to avoid OOM and reduce GC pressure, the old one does not need it because it's Unpooled, which generates much more gc counts and hurt performance.

I vote for adding this by default or adding the -Dio.netty.tryReflectionSetAccessible=true to the release notes, people who want the old behavior can enable it with -Dio.netty.allocator.type=unpooled if they like more gc.

He-Pin · 2025-01-09T17:47:03Z

I implemented the Netty 4-based remoting transport once when I was working for a game company, but the Akka team did not accept that PR for some reason, so we can only use that internally, after years, Pekko fork happens and we have the control of code, So we can do the right thing now, I'm using Netty at $Work too.

We should not simply blindly revert, let's do it in the right way, the CVEs are really annoying.

davidradl

It sounds like the title of the PR is not in line with what we want to in the PR comments. It sounds like there is appetite to fix this properly at the pekko higher version. So I am removing my approve - as this refers to the reversion code which is currently in the PR.

He-Pin · 2025-01-10T11:02:19Z

@davidradl Is there any investigation result update from your side, thanks.

ferenc-csaky · 2025-01-10T15:01:37Z

Opened #25955 which I believe should supersede this current PR.

davidradl · 2025-01-10T16:07:37Z

@ferenc-csaky sounds good - can we close this PR?

ferenc-csaky · 2025-01-10T16:11:45Z

IMO yes, I will close both this one and the 1.19 equivalent on Monday if no objections until then.

XComp · 2025-01-13T08:29:50Z

Sorry for not getting back earlier. Thanks for looking into the issue, @ferenc-csaky . But on a more general note and with the concerns @zentol shared in FLINK-36510:

Shouldn't we merge the revert at least for 1.19 as the stable release (i.e. sticking to the older netty version)?
I see your point with upgrading to netty 4.x for 1.20 as this seems to be the LTS version for Flink 2.x. One other option we have here is doing the upgrade for 1.20.2 and not including the it in 1.20.1 (which @afedulov is preparing right now). That would give us a few more CI release cycles. WDYT?

ferenc-csaky · 2025-01-13T09:54:54Z

@XComp I do not see any problem with your suggested approach for 1.19, it makes sense.

I can also accept to release it with 1.20.2 to the 1.20 line, if everybody else agrees, but personally am a bit more reluctant about that. I guess that release will happen in a couple months after 1.20.1 best case scenario, and surely that will give us more time to see if the CI runs are more stable or not regarding this aspect. But even if not, based on my current investigation probably it will be some other necessary configuration to make, not some serious memory bug. And on the other hand, the pesky Netty3 CVEs won't go anywhere. @afedulov WDYT, any requirements from your side?

afedulov · 2025-01-13T17:45:28Z

I believe that since we are not dealing with a memory leak but rather with different memory allocation (thanks @ferenc-csaky for confirming this!), we should aim to include the upgrade in both the 1.19 and 1.20 releases. While there is some risk of exposing users to OOM kills by pushing some workloads over hard memory limits, we have to weigh them against the risks posed by leaving numerous critical CVEs exposed on the network stack.

Netty 3.10.6 is the last 3.x release and therefore officially reached EOL more than 8 years ago. I also read reports of it suffering from multiple GC and memory management issues, so it is not like we are transitioning away from something that is rock solid and works perfectly to an experimental release.

The approach proposed by @He-Pin in the above comment sounds reasonable to me. If we adopt this approach, I think we should actually add -Dio.netty.tryReflectionSetAccessible=true parameter as default to the startup scripts, not merely to the documentation. An open question remains: should we retain the 3.x-style memory management by using unpooled allocations? My current understanding is that using both parameters should approximate the existing behavior, but it does not seem like unpooled allocation is strictly required for our case. @He-Pin, what’s your perspective on this?

As for fixing it in 1.20.2 or 1.20.1 - I am not convinced that having CI running more times will provide us required confidence. The primary concern lies in breaching memory limits in existing deployments rather than stability. If we agree this change is necessary for a patch release eventually, it would be logical to apply it now for both 1.19.2 and 1.20.1. I will make sure this potential concern is explicitly mentioned in the release notes.

@ferenc-csaky Do we have a rough understanding of how much more memory consumption does this new version induce? Does it look like some fixed amount or something that scales with the number of connections?

He-Pin · 2025-01-13T18:19:34Z

@afedulov Thanks for the ping.

For the Netty4 migration, we can't ship a library with many CVES, so it took me nearly 5 weekends to migrate it from Netty 3 to Netty 4, I have Flink in mind too. the multi-jvm-plugin is migrated from Netty 3 to Netty 4, and we upstream that to Akka, where it was Netty 3 and got merged. !test Migrate akka-multi-node-testkit to Netty4 akka/akka#32005
We did encounter this in production (Spring Boot Application with Netty-based RPC) , after we migrated from Java 8 to Java 11, we encountered OOMs, all be solved with simple: -Dio.netty.tryReflectionSetAccessible=true .
The Netty 4 works differently than it was in Netty 3, mainly for performance and less GC, and that's how it works. Yes, it may be my fault for using the PooledByteBufAllocator in the first place, I may need to keep the behavior with an UnpooledHeapAllocator, But I think that's not the best practice, so I'm not putting that behind a config option too.
You can take a look at the current implementation, I think I'm doing the best practice, But as time flies, It did take me a day to figure out if it really leaks on Pekko side, but after that, I can confirm it's NOT.
After Pekko 1.1.0 ships, we receive nothing about the leak report.

So I think keeping the Netty 3 version seems a little smoother brain, especially with @ferenc-csaky done detailed investigation, Keep it in Netty 3 will expose all downstream the supply chain with CVES, that's not actually right.

But I do suggest we do some long time stress testing about this( eg 1 or 2 days). I looked at some issues inside the Flink, eg parallelism serialization, I think which can be done in the current classical transport too.

In short: Stress testing to make sure it works smoothly and ships the Netty 4 version is my +1.

He-Pin · 2025-01-13T18:32:17Z

BTW, if Flink is supporting JDK 17+ too, please add below when running on JDK 17 or higher

--add-opens java.base/java.nio=ALL-UNNAMED
--add-opens java.base/jdk.internal.misc=ALL-UNNAMED

too.

He-Pin · 2025-01-13T18:39:39Z

@davidradl I think making the old behavior(unpooled) configurable can be done, but that will need @pjfanning to confirm backporting.

And the change is how the PooledByteBufAllocator works, which will cache some arenas, but TBH, 7M is very small for anywork load.

He-Pin · 2025-01-13T20:35:12Z

@XComp @ferenc-csaky @davidradl @afedulov I just prepared a PR apache/pekko#1707 for this, not sure if @pjfanning agree with backporting this to 1.1.4 release.

tomncooper · 2025-01-14T09:32:29Z

+1 for keeping the newer Pekko and Netty 4 in 1.20.1 (and not merging this PR). This is the LTS and my 2c is that having a more secure base and fixing any issues that arise is the better path.

He-Pin · 2025-01-14T10:47:37Z

A backport is pending apache/pekko#1709 as @afedulov requested.

pjfanning · 2025-01-14T12:21:25Z

apache/pekko#1707 is merged and tonight's snapshot release should include it. I would prefer if the Flink team test with the snapshot before we get involved in backporting changes and doing Pekko releases. There is no release planned so it is inconvenient for us to be pushing through speculative changes and doing releases with them.

ferenc-csaky · 2025-01-14T15:52:09Z

I am not sure I understand what apache/pekko#1709 adds? I mean I see that it adds control over the ByteBufAllocator, but if we specify it directly via io.netty.allocator.type isn't that does the same? I do not think we should wait for another Pekko release for these patch releases, since the Netty version is carved into stone anyways, as it's coming through flink-shaded, and bumping that requires a whole different discussion AFAIK.

Personally, I'd rather not complicate the defaults on the Flink side too much, so I do not think we should override io.netty.allocator.type by default. Maybe unpooled can be better in some cases, but I believe the default is picked on the Netty side to cover most use-cases the best, so sticking with that sounds reasonable to me. I did not spent time to learn an d analyze how much memory these reserve and how they work exactly, cause it seemed unnecessary.

On the other hand, enabling reflection via io.netty.tryReflectionSetAccessible=true can be reasonable and I do not see risk in that, so setting that for both JM and TM can be an option, but I am not sure how many workloads this can actually affect, so that's why I had the idea to document it instead.

He-Pin · 2025-01-14T18:33:29Z

@ferenc-csaky Yes and No, changing the io.netty.allocator.type will affect Flink, too. But the current one I added to Pekko will only affect Pekko. That way, you can keep using the unpooled heap buffer behavior, just like the old Netty 3-based remoting.

He-Pin · 2025-01-14T18:37:07Z

More memory usage is based on the threads, which means how many of Netty's FastThreadLocal threads are out there requesting bytebufs @afedulov . TBH, flink should just add io.netty.tryReflectionSetAccessible=true and go.

XComp · 2025-01-16T08:03:30Z

Thanks @afedulov @He-Pin and @ferenc-csaky for your valuable input.

My concern with including the pekko bump in 1.19.2 is that there is a proven change in functionality (users would have to add a flag to deal with more extensive memory usage as far as I understand). This goes against the contract for patch releases.

For 1.20.1 it can be reasoned that we deal with it differently because of 1.20 being a LTS version.

But for both, we might want to move the discussion into the ML rather than in a PR. It might be better to get a broader consensus from the community if we really want to put this into 1.19.2. Probably that's also a chance to discuss how the community should deal with 1.20 as the LTS version.

normanmaurer · 2025-01-16T10:39:21Z

So just did read up on this thread and I have a few suggestion.... Just to give some background, I am the Netty Project Lead (if you don't know this already).

You definitely should get rid of Netty 3.x. This has been EOL for years which means there are for sure many security related issues and there is also no guarantee that it will work correctly on new Java versions. I know that it might be tempting to just keep using it if things seem to work for now but you will be in big problems if things break for whatever reason or if a critical security bug is found that affects you. So please upgrade asap...

Now after all of this is said let's focus on the things you see here and what you can do about it.

I really don't understand why -Dio.netty.tryReflectionSetAccessible=true should make a huge difference in terms of memory usage. It is mostly just enabling some optimizations that would reduce GC pressure. I mean it does not hurt to enable it but it would be more for performance then for memory usage.
For best performance you should use the PooledByteBufAllocator... That said it is definitely not the right thing to do if you are memory constrained. Like it will never work with the default config if you really only have 7MB. In this case you can switch back to the `UnpooledByteBufAllocator, but will need to pay the extra overhead of frequently allocate / deallocate direct memory. This is expensive compared to heap allocations, so be aware of this.

If you want to try to match the memory usage of Netty 3.x you want to use the unpooled allocator.

ferenc-csaky · 2025-01-16T11:02:33Z

@ferenc-csaky Yes and No, changing the io.netty.allocator.type will affect Flink, too. But the current one I added to Pekko will only affect Pekko. That way, you can keep using the unpooled heap buffer behavior, just like the old Netty 3-based remoting.

@He-Pin This is a valid point, although other then Pekko itself Netty is only used in Flink by its shuffle service, which makes it possible to redistribute data between operators in the job pipeline. Having a Pekko-specific option indeed would be useful, but IMO would not worth the trouble for you guys to backport+release it only because of this specific situation, cause according to my understanding it should not cause any problems in real-life scenarios, where memory is not limited into the bare minimum.

@normanmaurer Thank you for the Netty side technical details! Enabling reflection definitely not makes any huge difference, but the CI fails sometimes because of a test case, where the memory which is usable by Netty is limited to the bare minimum (7MB), and that causes falkiness, because sometimes that's not even enough to spin up the necessary JVM processes, which actually fails the test, and then the CI. Based on my local test runs, setting -Dio.netty.tryReflectionSetAccessible=true seemed to stabilize that specific test. Now I am thinking about adding -Dio.netty.allocator.type=unpooled to that specific test as well to make it sure, as the CI is running in a different environment.

Personally I believe leaving the Netty4 default settings as is should not cause too much trouble, maybe in some cases, where the memory allocation that Netty can use is kept to the bare minimum with the Netty3 default settings, which may not be enough for the Netty4 defaults.

Anyways, I am okay with moving this discussion to the ML to make sure it reaches a wider audience.

normanmaurer · 2025-01-16T11:12:47Z

@ferenc-csaky Yes and No, changing the io.netty.allocator.type will affect Flink, too. But the current one I added to Pekko will only affect Pekko. That way, you can keep using the unpooled heap buffer behavior, just like the old Netty 3-based remoting.

@He-Pin This is a valid point, although other then Pekko itself Netty is only used in Flink by its shuffle service, which makes it possible to redistribute data between operators in the job pipeline. Having a Pekko-specific option indeed would be useful, but IMO would not worth the trouble for you guys to backport+release it only because of this specific situation, cause according to my understanding it should not cause any problems in real-life scenarios, where memory is not limited into the bare minimum.

@normanmaurer Thank you for the Netty side technical details! Enabling reflection definitely not makes any huge difference, but the CI fails sometimes because of a test case, where the memory which is usable by Netty is limited to the bare minimum (7MB), and that causes falkiness, because sometimes that's not even enough to spin up the necessary JVM processes, which actually fails the test, and then the CI. Based on my local test runs, setting -Dio.netty.tryReflectionSetAccessible=true seemed to stabilize that specific test. Now I am thinking about adding -Dio.netty.allocator.type=unpooled to that specific test as well to make it sure, as the CI is running in a different environment.

Personally I believe leaving the Netty4 default settings as is should not cause too much trouble, maybe in some cases, where the memory allocation that Netty can use is kept to the bare minimum with the Netty3 default settings, which may not be true anymore.

Anyways, I am okay with moving this discussion to the ML to make sure it reaches a wider audience.

Feel free to add me to the cc there as well: norman at apache dot org

XComp · 2025-01-23T17:48:26Z

Closing this PR as it is superseded by FLINK-37100

Revert "[FLINK-36510][rpc] Bump Pekko to 1.1.2, remove Netty 3"

e7d2bb2

This reverts commit 4776c96.

XComp changed the title ~~[FLINK-36979][rpc] Reverting pekko version bump~~ [FLINK-36979][rpc] Reverting pekko version bump in Flink 1.20 Dec 29, 2024

ferenc-csaky approved these changes Dec 29, 2024

View reviewed changes

afedulov mentioned this pull request Jan 6, 2025

[FLINK-36979][rpc] Reverting pekko version bump in Flink 1.19 #25867

Closed

davidradl approved these changes Jan 9, 2025

View reviewed changes

He-Pin mentioned this pull request Jan 10, 2025

perf: optmize NettyChannelHandlerAdapter with explict extends. (#1667) apache/pekko#1698

Closed

davidradl approved these changes Jan 10, 2025

View reviewed changes

davidradl suggested changes Jan 10, 2025

View reviewed changes

He-Pin mentioned this pull request Jan 13, 2025

chore: Add support for controlling the NettyTransport's byteBuf allocator type. apache/pekko#1707

Merged

XComp closed this Jan 23, 2025

[FLINK-36979][rpc] Reverting pekko version bump in Flink 1.20 #25866

[FLINK-36979][rpc] Reverting pekko version bump in Flink 1.20 #25866

Conversation

XComp commented Dec 29, 2024

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

flinkbot commented Dec 29, 2024 • edited Loading

CI report:

He-Pin commented Dec 31, 2024

He-Pin commented Dec 31, 2024 • edited Loading

XComp commented Jan 3, 2025

He-Pin commented Jan 3, 2025

He-Pin commented Jan 3, 2025

He-Pin commented Jan 6, 2025

XComp commented Jan 6, 2025

He-Pin commented Jan 6, 2025

ferenc-csaky commented Jan 6, 2025

davidradl commented Jan 9, 2025

He-Pin commented Jan 9, 2025 • edited Loading

ferenc-csaky commented Jan 9, 2025

He-Pin commented Jan 9, 2025

He-Pin commented Jan 9, 2025 • edited Loading

davidradl left a comment

Choose a reason for hiding this comment

He-Pin commented Jan 10, 2025

ferenc-csaky commented Jan 10, 2025

davidradl commented Jan 10, 2025

ferenc-csaky commented Jan 10, 2025

XComp commented Jan 13, 2025

ferenc-csaky commented Jan 13, 2025

afedulov commented Jan 13, 2025 • edited Loading

He-Pin commented Jan 13, 2025

He-Pin commented Jan 13, 2025 • edited Loading

He-Pin commented Jan 13, 2025

He-Pin commented Jan 13, 2025 • edited Loading

tomncooper commented Jan 14, 2025

He-Pin commented Jan 14, 2025

pjfanning commented Jan 14, 2025

ferenc-csaky commented Jan 14, 2025

He-Pin commented Jan 14, 2025

He-Pin commented Jan 14, 2025

XComp commented Jan 16, 2025

normanmaurer commented Jan 16, 2025 • edited Loading

ferenc-csaky commented Jan 16, 2025 • edited Loading

normanmaurer commented Jan 16, 2025

XComp commented Jan 23, 2025

flinkbot commented Dec 29, 2024 •

edited

Loading

He-Pin commented Dec 31, 2024 •

edited

Loading

He-Pin commented Jan 9, 2025 •

edited

Loading

He-Pin commented Jan 9, 2025 •

edited

Loading

afedulov commented Jan 13, 2025 •

edited

Loading

He-Pin commented Jan 13, 2025 •

edited

Loading

He-Pin commented Jan 13, 2025 •

edited

Loading

normanmaurer commented Jan 16, 2025 •

edited

Loading

ferenc-csaky commented Jan 16, 2025 •

edited

Loading