-
Notifications
You must be signed in to change notification settings - Fork 213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate with Kafka Connect and Debezium MySQL/Postgres/MongoDb plugin for CDC #3236
Conversation
…ctors Signed-off-by: Haidong <[email protected]>
Signed-off-by: Haidong <[email protected]>
@hshardeesi @kkondaka as well to help with review. |
@@ -143,13 +143,13 @@ subprojects { | |||
} | |||
implementation('org.eclipse.jetty:jetty-http') { | |||
version { | |||
require '11.0.15' | |||
require '9.4.48.v20220622' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are going to an old revision?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kafka Connect doesn't work with 11.0.15, the latest one is with 9.4.48.v20220622.
I mentioned this as the first thing in the high-level design, and has been approved by Rajs and Dinu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a bit of a problem.
Also, why does this even need Jetty? Perhaps we can disable the admin server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kafka Connect needs it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the newer version is 9.4.51.v20230217
* Unique with single instance for each pipeline. | ||
*/ | ||
public class KafkaConnect { | ||
public static final long CONNECTOR_TIMEOUT_MS = 30000L; // 30 seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make this and the other statics below configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a good idea, I can take these two as config.
@wanghd89 I have a general question. How do we support codecs (like avro, json, protobuf, etc) with the Kafka connect? |
Kafka Connect support those using key.converter/value.converter. It has its own converter classes. If we start bringing more Kafka Connectors who read data in other codecs, Kafka Connect has configurations support for those. |
@wanghd89 please fix the code verification failure. |
implementation 'javax.validation:validation-api:2.0.1.Final' | ||
implementation libs.reflections.core | ||
implementation 'io.micrometer:micrometer-core' | ||
implementation 'org.eclipse.jetty:jetty-server:9.4.48.v20220622' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we exclude Jetty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot, it has to run a REST API by Kafka Connect
@@ -143,13 +143,13 @@ subprojects { | |||
} | |||
implementation('org.eclipse.jetty:jetty-http') { | |||
version { | |||
require '11.0.15' | |||
require '9.4.48.v20220622' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be a bit of a problem.
Also, why does this even need Jetty? Perhaps we can disable the admin server?
@@ -85,9 +86,11 @@ public DataPrepperConfiguration( | |||
final Duration sinkShutdownTimeout, | |||
@JsonProperty("circuit_breakers") final CircuitBreakerConfig circuitBreakerConfig, | |||
@JsonProperty("source_coordination") final SourceCoordinationConfig sourceCoordinationConfig, | |||
@JsonProperty("kafka_cluster_config") final KafkaClusterConfig kafkaClusterConfig, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a perfect place to use Data Prepper extensions.
See my comment on your issue: #3233 (comment)
*/ | ||
|
||
@SuppressWarnings("deprecation") | ||
@DataPrepperPlugin(name = "kafka_connect", pluginType = Source.class, pluginConfigurationType = KafkaConnectSourceConfig.class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see this comment:
As noted there, we should have different sources for each of the databases: Postgresql, MySQL, MongoDB.
A good way to solve this would be to add a Kafka Connect extensions. See the aws-plugin-api
and aws-plugin
projects for examples.
You could have a kafka-connect-api
project which provides a basic API. Then, the mysql-source
project would depend on it. (Similar for postgres-source
, and mongodb-source
).
The implementation can be an extensions in a project such as kakfa-connect-extension
.
@@ -36,6 +36,8 @@ dependencyResolutionManagement { | |||
library('spring-context', 'org.springframework', 'spring-context').versionRef('spring') | |||
version('guava', '32.0.1-jre') | |||
library('guava-core', 'com.google.guava', 'guava').versionRef('guava') | |||
version('reflections', '0.9.12') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned about downgrading such a core component.
@@ -3,20 +3,42 @@ | |||
* SPDX-License-Identifier: Apache-2.0 | |||
*/ | |||
|
|||
package org.opensearch.dataprepper.plugins.kafka.configuration; | |||
package org.opensearch.dataprepper.model.plugin.kafka; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep these out of data-prepper-api
. We should reserve this project for the most common APIs, not specific technologies.
You can make use of extensions, or perhaps add a kafka-common
project if you want to share with the kafka
source/sink.
this.encryptionConfig = encryptionConfig; | ||
} | ||
|
||
// @Override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove all commented-out code.
@@ -0,0 +1,30 @@ | |||
package org.opensearch.dataprepper.plugins.sink; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All new files need a header. Use the following if the code is exclusively new code:
/*
* Copyright OpenSearch Contributors
* SPDX-License-Identifier: Apache-2.0
*/
There is another if you are pulling from another open-source project. Let us know if you need it.
There is major change for this PR Due to the introduction of Extension Plugins. |
Description
Integrate with Kafka Connect and Debezium MySQL/Postgres/MongoDb plugin for CDC
Issues Resolved
Resolves #3233 #3234 #3235
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.