Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] trunk from apache:trunk #1292

Open
wants to merge 3,959 commits into
base: trunk
Choose a base branch
from
Open

[pull] trunk from apache:trunk #1292

wants to merge 3,959 commits into from

Conversation

pull[bot]
Copy link

@pull pull bot commented Jun 11, 2023

See Commits and Changes for more details.


Created by pull[bot]

Can you help keep this open source service alive? 💖 Please sponsor : )

@pull pull bot added the ⤵️ pull label Jun 11, 2023
aliehsaeedii and others added 29 commits February 3, 2025 10:43
Implements streams sticky assignor on the broker-side.

Reviewers: Bill Bejeck <[email protected]>, Lucas Brutschy <[email protected]>
CoordinatorRecordSerde does not validate the version of the value to check whether the version is supported by the current version of the software. This is problematic if a future and unsupported version of the record is read by an older version of the software because it would misinterpret the bytes. Hence CoordinatorRecordSerde must throw an error if the version is unknown. This is also consistent with the handling in the old coordinator.

Reviewers: Jeff Kim <[email protected]>
Update `benchmark_test.py` to use KRaft.

```
> TC_PATHS="tests/kafkatest/benchmarks/core/benchmark_test.py" /bin/bash tests/docker/run_tests.sh

================================================================================
SESSION REPORT (ALL TESTS)
ducktape version: 0.12.0
session_id:       2025-02-03--001
run time:         96 minutes 48.900 seconds
tests run:        120
passed:           120
flaky:            0
failed:           0
ignored:          0
================================================================================
```

Reviewers: David Jacot <[email protected]>
… task (#18717)

During testing we discovered that the empty group count is not updated in group conversion, but when the new group is transition to other state, the empty group count is decremented. This could result in negative empty group count.

We can have a new consumer group count implementation that follows the pattern we did for the classic group count. The timeout task periodically refreshes the metrics based on the current groups soft state.

Reviewers: Jeff Kim <[email protected]>
A class to build a new target assignment based on the provided parameters. As a result, it yields the records that must be persisted to the log and the new member assignments as a map.

Compared to the feature branch, I extended the unit tests (testing also standby and warm-up task logic) and adopted simplifications due to the TasksTuple class.

Reviewers: Bruno Cadonna <[email protected]>, Bill Bejeck <[email protected]>
The statement gets logged in the INFO level and gets printed for every message produced to the __remote_log_metadata topic. Removed the log statement as it is needed only during debug session. And, we have another log at DEBUG level to capture this information.

Reviewers: Luke Chen <[email protected]>, Christo Lolov <[email protected]>
…o early return (#18720)

https://issues.apache.org/jira/browse/KAFKA-18575 solved a critical race condition by returning with CONCURRENT_TRANSACTIONS early when the transaction was still completing.
In testing, it was discovered that this early return could cause performance regressions.

Prior to KIP-890 the addpartitions call was a separate call from the producer. There was a previous change https://issues.apache.org/jira/browse/KAFKA-5477 that decreased the retry backoff to 20ms. With KIP-890 and making the call through the produce path, we go back to the default retry backoff which takes longer. Prior to 18575 we introduce a slight delay when sending to the coordinator, so prior to 18575, we are less likely to return quickly and get stuck in this backoff. However, based on results from produce benchmarks, we can still run into the default backoff in some scenarios.

This PR reverts KAFKA-18575, and doesn't return early and wait until the coordinator for checking if a transaction is ongoing. Instead, it will fix the handling with the verification guard so we don't hit the edge condition.

Also cleans up some of the verification text that was unclear.

Reviewers: Jeff Kim <[email protected]>, Artem Livshits <[email protected]>
Introduced via KIP-1106.

Reviewers: Lucas Brutschy <[email protected]>
We need to re-enable the unclean shutdown detection when in ELR mode, which was inadvertently removed during the development process.

Reviewers: David Mao <[email protected]>,  Jun Rao <[email protected]>
This patch fixes the Benchmark system tests. We misconfigured the quorum in bc7b870.

```
================================================================================
SESSION REPORT (ALL TESTS)
ducktape version: 0.12.0
session_id:       2025-02-04--001
run time:         57 minutes 27.169 seconds
tests run:        62
passed:           62
flaky:            0
failed:           0
ignored:          0
================================================================================
```

Reviewers: PoAn Yang <[email protected]>, Christo Lolov <[email protected]>
This patch fixes the PerformanceService system test which was still using ZK.

```
================================================================================
SESSION REPORT (ALL TESTS)
ducktape version: 0.12.0
session_id:       2025-02-04--003
run time:         1 minute 42.629 seconds
tests run:        4
passed:           4                                                                                                                                                                         flaky:            0
failed:           0                                                                                                                                                                         ignored:          0
================================================================================
```

Reviewers: Chia-Ping Tsai <[email protected]>
```
================================================================================
SESSION REPORT (ALL TESTS)
ducktape version: 0.12.0
session_id:       2025-02-04--005
run time:         4 minutes 0.023 seconds
tests run:        4
passed:           4
flaky:            0
failed:           0
ignored:          0
================================================================================
```

Reviewers: Lianet Magrans <[email protected]>
Increase the maximum number of flaky tests we tolerate for the main test suite from 3 to 10. This will result in fewer failed builds.

Reviewers: Chia-Ping Tsai <[email protected]>
The Streams membership manager is used client-side in the
background thread of the async consumer. For each member
/consumer, it is responsible for:
* keeping the member state,
* keeping assignments for the member,
* reconciling the assignments of the member -- for example
when tasks need to be revoked before other tasks are assigned
* requesting invocations of assignment and revocation callbacks
by the stream thread.

The Streams membership manager is called by the background thread of
the async consumer, directly in its event loop and from the
 Streams group heartbeat request manager. The Streams membership
manager uses the Streams rebalance events processor to request
assignment/revocation callback in the stream thread.

Reviewers: Lucas Brutschy <[email protected]>, Bill Bejeck <[email protected]>
It appears this test was failing because the transaction was never aborting and the concurrent transactions errors would not go away.

ccab9eb introduced the test failure because it requires the transaction to complete, but I suspect the lack of completion was happening before the change.

The timeout for the write is based on the transactional timeout, and 100ms seemed too small -- thus the requests to update the state would often repeatedly time out.

Also removed the loop since it was not necessary.

Reviewers: Jeff Kim <[email protected]>, Calvin Liu <[email protected]>
gongxuanzhang and others added 30 commits February 27, 2025 10:10
…#19016)

KIP-966 adds strict min ISR rule, so this PR improves the docs of min.insync.replicas to include that change.

Reviewers: Ismael Juma <[email protected]>, Chia-Ping Tsai <[email protected]>
Given that now we support Java 17 on our brokers, this PR replace the
use of :

Collections.singletonList() and Collections.emptyList() with List.of()
Collections.singletonMap() and Collections.emptyMap() with Map.of()
Collections.singleton() and Collections.emptySet() with Set.of()

Affected modules: server and coordinator-common

Reviewers: Chia-Ping Tsai <[email protected]>
- Added a unit test to validate the exception hierarchy for all KIP-1050
transaction related exceptions.
- RetriableException is correctly extended by all child classes
- Included test for RetriableException exception with verification that
all exceptions extending `RetriableException` do not inadvertently
extend `RefreshRetriableException, preserving the intended behavior.

Reviewers: Kirk True <[email protected]>, TaiJuWu <[email protected]>, TengYao Chi <[email protected]>,  Ken Huang <[email protected]>, Justine Olshan <[email protected]>
…roker (#18997)

In #16848, we added `kraft.version`
to finalized features and got finalized features outside controller
event handling thread. This may make finalized features stale when
processing `registerBroker` event. Also, some cases like
`QuorumControllerTest.testBalancePartitionLeaders` become flaky cause of
outdated MV. This PR moves finalized features back to controller event
handling thread to avoid the error.

Reviewers: Ismael Juma <[email protected]>, Jun Rao <[email protected]>,
Colin P. McCabe <[email protected]>, Chia-Ping Tsai
<[email protected]>
…e configuration data in logs (#18349)

Reviewers: Chia-Ping Tsai <[email protected]>
…9044)

`testActivationMessageForEmptyLogAtMv3_6WithTransactionAndElr` is tested
with 4.0IV1, so the name should be corrected.

Reviewers: Jun Rao <[email protected]>
There isn't any flaky test for SharePartitionManager in last 7 days, removing flaky annotation.

Reviewers: Andrew Schofield <[email protected]>
The PR implements the SharePartitionMetrics as defined in KIP-1103, with
one change. The metric `FetchLockRatio` is defined as `Meter` in KIP but
is implemented as `HIstogram`. There was a discussion about same on
KIP-1103 discussion where we thought that `FetchLockRatio` is
pre-aggregated but while implemeting the rate from `Meter` can go above
100 as `Meter` defines rate per time period. Hence it makes more sense
to implement metric `FetchLockRatio` as `Histogram`.

Reviewers: Andrew Schofield <[email protected]>
…9036)

The purpose of this PR is to remove the `@InterfaceStability.Evolving` from classes that were created over a year ago.

Reviewers: Jun Rao <[email protected]>
…plica.alter.log.dirs.threads (#19038)

Update the documentation for the `num.replica.alter.log.dirs.threads` configuration to clarify that its default value is determined by the number of log directories specified in the `log.dirs` configuration.

Reviewers: Ken Huang <[email protected]>, Chia-Ping Tsai <[email protected]>
Given that now we support Java 17 on our brokers, this PR replace the
use of :
- `Collections.singletonList()` and `Collections.emptyList()` with
`List.of()`
- `Collections.singletonMap()` and `Collections.emptyMap()` with
`Map.of()`
- `Collections.singleton()` and `Collections.emptySet()` with `Set.of()`

Reviewers: Chia-Ping Tsai <[email protected]>
…eException (#19047)

Remove kafka.cluster.Broker and BrokerEndPointNotAvailableException as they were used by zk path.

Reviewers: TengYao Chi <[email protected]>, Ken Huang <[email protected]>, Chia-Ping Tsai <[email protected]>
…than Short.MAX_VALUE (#19067)

JIRA: KAFKA-18908
`ZooKeeper` mode allows large configuration values to be created through
an append operation. `KRaft` mode restricts configuration values to a
maximum size of `Short.MAX_VALUE`.
We should document this behavioral change.

Reviewers: Ken Huang <[email protected]>, Chia-Ping Tsai <[email protected]>
… larger (#19070)

In ZooKeeper mode, users can append configurations to create values larger than Short.MAX_VALUE. However, this behavior is disallowed in KRaft mode. Additionally, a server error is returned to users. Creating a value this large is rare, so we don't plan to fix it for KRaft. This PR aims to tweak the error message.

Reviewers: Ken Huang <[email protected]>, TengYao Chi <[email protected]>, Chia-Ping Tsai <[email protected]>
… and closing (#18752)

To address the issue of not creating a checkpoint file during the
restoring and closing process, called the
GlobalStateUpdateTask.flushState() method in
GlobalStateUpdateTask.initialize() and GlobalStateUpdateTask.close()
methods. This will flush the state and create a checkpoint file thereby,
avoiding the need to completely restore the entire state.

Reviewers: Alieh Saeedi <[email protected]>, Matthias J. Sax <[email protected]>
There are 3 issues (at least) about the multithreaded issue on ConsumerRecords. Hence, it would be better to document it completely. 

Reviewers: Kirk True <[email protected]>, TengYao Chi <[email protected]>, Ken Huang <[email protected]>, Xuan-Zhang Gong <[email protected]>, Chia-Ping Tsai <[email protected]>
…eat (#18981)

Implements auto-topic creation when handling the streams group
heartbeat.

Inside KafkaApis, the handler for streamsGroupHeartbeat uses the result
of the streams group heartbeat inside the group coordinator to attempt
to create all missing internal topics using AutoTopicCreationManager.
CREATE TOPIC ACLs are checked. The unit tests class
AutoTopicCreationManagerTest is brought back (it was recently deleted
during a ZK removal PR), but testing only the kraft-related
functionality.

Reviewers: Bruno Cadonna <[email protected]>

### Committer Checklist (excluded from commit message)
- [ ] Verify design and implementation 
- [ ] Verify test coverage and CI build status
- [ ] Verify documentation (including upgrade notes)
…19043)

The AbstractHeartbeatRequestManager and the
StreamsGroupHeartbeatRequestManager, both use the
HeartbeatRequestState to track the state of the heartbeat requests. Both
heartbeat request managers have an implementation of
HeartbeatRequestState as inner class.
To deduplicate code this commit extracts the HeartbeatRequestState so
that the same code can be used by both heartbeat request manager.

Reviewers: Kirk True <[email protected]>, Chia-Ping Tsai
<[email protected]>, Lucas Brutschy <[email protected]>
… check topic describe (#19055)

1、Client support for TopicAuthException in DescribeShareGroup and HB
path
2、ShareConsumerImpl#sendAcknowledgementsAndLeaveGroup swallow
TopicAuthorizationException and GroupAuthorizationException

Reviewers: ShivsundarR <[email protected]>, Andrew Schofield <[email protected]>
…nder EOS (#17931)" (#19078)

This reverts commit e883746.

The commit that is reverted prevents Kafka Streams from re-initializing
its transactional producer. If an exception that fences the
transactional producer occurs, the producer is not re-initialized during
the handling of the exception. That causes an infinite loop of
ProducerFencedExceptions with corresponding rebalances.

Reviewers: Lucas Brutschy <[email protected]>, David Jacot
<[email protected]>
…s (KIP-1103) (#19059)

The PR implements the ShareSessionCache and DelayedShareFetchMetrics as
defined in KIP-1103.

Reviewers: Andrew Schofield <[email protected]>
The new group coordinator prints the following line at fixed interval
even if no groups were deleted:

```
Generated 0 tombstone records while cleaning up group metadata in 0 milliseconds. (org.apache.kafka.coordinator.group.GroupCoordinatorShard)
```

The time component has some value in its own but it may be better to not
print if when there are not records in order to reduce the spam in the
logs.

Reviewers: Chia-Ping Tsai <[email protected]>
…n BrokerLifecycleManager (#19061)

This PR reduces `maxInterval` for `resendExponentialBackoff` in
`BrokerLifecycleManager` class from `broker.session.timeout.ms` to half
of its value. Setting `maxInterval` to `broker.session.timeout.ms`
caused brokers to be fenced if a resend attempt occurred near the
timeout threshold, leading to unnecessary broker fencing.

Reviewers: Colin P. McCabe <[email protected]>
The default checkout behavior for GitHub Actions is to use a special
merge ref which is equivalent to the base branch with the PR merged into
it. While this is crucial for checking compilation issues against trunk,
it significantly diminishes our ability to use any build caching.

This patch changes the JUnit test jobs to checkout the HEAD commit of the PR 
when building. The "Compile and Check" step still checks out the merge commit
so we can keep that level of validation.

Reviewers: Ismael Juma <[email protected]>, Chia-Ping Tsai <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.