-
Notifications
You must be signed in to change notification settings - Fork 14.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-18569: New consumer close may wait on unneeded FindCoordinator #18590
Conversation
fae88bc
to
3bb4130
Compare
Thanks for the PR @frankvicky! I was curious if you had a chance to look into using the Thanks! |
3bb4130
to
f6a878e
Compare
Hi @kirktrue |
f3def68
to
ba51a9d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the refresh on the PR @frankvicky! This looks much more succinct.
I'm still unsure what the behavior is for this sequence of events:
- The coordinator is marked as unknown
CoordinatorRequestManager.poll()
is called and creates a newFindCoordinatorRequest
- The
NetworkClientDelegate
sends the request to the broker Consumer.close()
is called with a timeout of 30 secondsConsumerNetworkThread.sendUnsentRequests()
is called
In step 5, won't it continue to loop for ~30 seconds because the find request created in step 2 (and sent in step 3) is still inflight when ConsumerNetworkThread.sendUnsentRequests()
is called?
do {
networkClientDelegate.poll(timer.remainingMs(), timer.currentTimeMs());
timer.update();
} while (timer.notExpired() && networkClientDelegate.hasAnyPendingRequests());
NetworkClientDelegate.hasAnyPendingRequests()
will return true
while there are any in-flight requests.
Any thoughts?
Thanks!
Hi @kirktrue, Thanks for the review. |
b9fa0df
to
97e53cb
Compare
Currently,
It seems that the behavior describe in comment are not followed: kafka/core/src/test/scala/integration/kafka/api/ConsumerBounceTest.scala Lines 300 to 304 in 3276759
Updated: kafka/clients/src/main/java/org/apache/kafka/clients/consumer/internals/AsyncKafkaConsumer.java Line 1496 in bdc92fd
Now the |
Hey here, I don't quite get how the The To find a solution, let's look at the classic consumer first, this is my understanding:
Correct me there, but if that's the behaviour, could it be achieved in the new consumer by allowing the |
Hi @lianetm @kirktrue, In the classic consumer, the timeout respects kafka/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ClassicKafkaConsumer.java Line 1140 in 8c0a0e0
kafka/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ClassicKafkaConsumer.java Lines 1130 to 1134 in 8c0a0e0
However, in the async consumer, this logic is either missing or only applies to individual requests. kafka/clients/src/main/java/org/apache/kafka/clients/consumer/internals/ConsumerCoordinator.java Lines 976 to 989 in 8c0a0e0
Should we align the behavior between async and classic consumers? |
Hey @frankvicky, good finding. Agree that the behaviour is not aligned in the close timeout handling, so in practice the classic consumer.close will never wait for more than the request timeout if there is a call to close with a larger timeout (and that's indeed missing on the async close timeout) Actually, the behaviour is explicitly called out in one of the tests: So I do agree that we need to align this. But just for my understanding, this is something else we need here to unblock these tests (the If my understanding is right then I think we should file a separate jira for the close timeout considering the request timeout, and if you can validate locally that it's the only fix required to enable the |
Hi @lianetm |
I agree that we should align the behavior with how it has functioned for a long time (f72203e). Additionally, we should document this behavior for both |
97e53cb
to
4eb61e0
Compare
…assic consumer JIRA: KAFKA-18645 see discussion: apache#18590 (comment) In the classic consumer, the timeout respects request.timeout.ms. However, in the async consumer, this logic is either missing or only applies to individual requests. Unlike the classic consumer, where request.timeout.ms works for the entire coordinator closing behavior, the async implementation handles timeouts differently. We should align the close timeout-handling to enable ConsumerBounceTest#testClose
4eb61e0
to
fee2041
Compare
…assic consumer JIRA: KAFKA-18645 see discussion: apache#18590 (comment) In the classic consumer, the timeout respects request.timeout.ms. However, in the async consumer, this logic is either missing or only applies to individual requests. Unlike the classic consumer, where request.timeout.ms works for the entire coordinator closing behavior, the async implementation handles timeouts differently. We should align the close timeout-handling to enable ConsumerBounceTest#testClose
…assic consumer JIRA: KAFKA-18645 see discussion: apache#18590 (comment) In the classic consumer, the timeout respects request.timeout.ms. However, in the async consumer, this logic is either missing or only applies to individual requests. Unlike the classic consumer, where request.timeout.ms works for the entire coordinator closing behavior, the async implementation handles timeouts differently. We should align the close timeout-handling to enable ConsumerBounceTest#testClose
fee2041
to
d236715
Compare
The old/new approach to include a specialized event makes sense. Thanks for the suggestion @lianetm! |
d236715
to
09fd01b
Compare
…assic consumer JIRA: KAFKA-18645 see discussion: apache#18590 (comment) In the classic consumer, the timeout respects request.timeout.ms. However, in the async consumer, this logic is either missing or only applies to individual requests. Unlike the classic consumer, where request.timeout.ms works for the entire coordinator closing behavior, the async implementation handles timeouts differently. We should align the close timeout-handling to enable ConsumerBounceTest#testClose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @frankvicky ! Just one nit left. Also pls merge trunk latest changes to get the latests test fixed and will check the build again. Thanks!
* limitations under the License. | ||
*/ | ||
package org.apache.kafka.clients.consumer.internals.events; | ||
public class StopFindCoordinatorOnCloseEvent extends ApplicationEvent { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a java doc here? Mainly to describe that the purpose of this event is to ensure that the CoordinatorRequestManager does not generate FindCoordinator requests when the consumer is closing and has already completed the operations that require a coordinator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I have just written some description for it. PTAL 😺
09fd01b
to
5fede5a
Compare
...java/org/apache/kafka/clients/consumer/internals/events/StopFindCoordinatorOnCloseEvent.java
Outdated
Show resolved
Hide resolved
JIRA: KAFKA-18569 Please refer to ticker for further details
Co-authored-by: Lianet Magrans <[email protected]>
7c92923
to
9a2e706
Compare
…assic consumer JIRA: KAFKA-18645 see discussion: apache#18590 (comment) In the classic consumer, the timeout respects request.timeout.ms. However, in the async consumer, this logic is either missing or only applies to individual requests. Unlike the classic consumer, where request.timeout.ms works for the entire coordinator closing behavior, the async implementation handles timeouts differently. We should align the close timeout-handling to enable ConsumerBounceTest#testClose
Failed test is handled by #18735 |
…18590) Reviewers: Lianet Magrans <[email protected]>, Kirk True <[email protected]>, Chia-Ping Tsai <[email protected]>
Merged to trunk and cherry-picked to 4.0 |
…assic consumer JIRA: KAFKA-18645 see discussion: apache#18590 (comment) In the classic consumer, the timeout respects request.timeout.ms. However, in the async consumer, this logic is either missing or only applies to individual requests. Unlike the classic consumer, where request.timeout.ms works for the entire coordinator closing behavior, the async implementation handles timeouts differently. We should align the close timeout-handling to enable ConsumerBounceTest#testClose
…assic consumer JIRA: KAFKA-18645 see discussion: apache#18590 (comment) In the classic consumer, the timeout respects request.timeout.ms. However, in the async consumer, this logic is either missing or only applies to individual requests. Unlike the classic consumer, where request.timeout.ms works for the entire coordinator closing behavior, the async implementation handles timeouts differently. We should align the close timeout-handling to enable ConsumerBounceTest#testClose
JIRA: KAFKA-18569
Please refer to ticket for further details.
In short, now new consumer close may wait for a
FindCoordinator
unsent request to go out when closing the consumer, even after the commit/leaveGroup stages of close are done.Committer Checklist (excluded from commit message)