From 123a31e813c12d991b6ed6fe323ff22699c56409 Mon Sep 17 00:00:00 2001 From: Marcos Marx Date: Fri, 21 Apr 2023 12:34:55 -0300 Subject: [PATCH] =?UTF-8?q?=F0=9F=8E=89=20New=20Destination:=20Starburst?= =?UTF-8?q?=20Galaxy=20(#25399)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Mayank Vadariya <48036907+mayankvadariya@users.noreply.github.com> --- .../main/resources/icons/starburst-galaxy.svg | 1 + .../seed/destination_definitions.yaml | 7 + .../resources/seed/destination_specs.yaml | 152 +++++ .../src/main/resources/seed/oss_catalog.json | 141 ++++ .../io/airbyte/db/factory/DatabaseDriver.java | 1 + .../.dockerignore | 3 + .../destination-starburst-galaxy/BOOTSTRAP.md | 8 + .../destination-starburst-galaxy/Dockerfile | 18 + .../destination-starburst-galaxy/README.md | 65 ++ .../destination-starburst-galaxy/build.gradle | 44 ++ .../integration_tests/configured_catalog.json | 24 + .../sample_secrets/config.json | 18 + .../starburst_galaxy/ColumnMetadata.java | 9 + .../HadoopCatalogIcebergS3ParquetWriter.java | 179 +++++ .../StarburstGalaxyBaseDestination.java | 93 +++ .../StarburstGalaxyConstants.java | 25 + .../StarburstGalaxyDestination.java | 30 + .../StarburstGalaxyDestinationConfig.java | 49 ++ .../StarburstGalaxyDestinationResolver.java | 26 + .../StarburstGalaxyNameTransformer.java | 39 ++ .../StarburstGalaxyS3Destination.java | 26 + ...StarburstGalaxyS3StagingStorageConfig.java | 40 ++ .../StarburstGalaxyS3StreamCopier.java | 174 +++++ .../StarburstGalaxyS3StreamCopierFactory.java | 44 ++ .../StarburstGalaxySqlOperations.java | 60 ++ .../StarburstGalaxyStagingStorageConfig.java | 33 + .../StarburstGalaxyStagingStorageType.java | 9 + .../StarburstGalaxyStreamCopier.java | 187 ++++++ .../StarburstGalaxyStreamCopierFactory.java | 12 + .../starburst_galaxy/TableSchema.java | 29 + .../src/main/resources/spec.json | 130 ++++ ...rburstGalaxyDestinationAcceptanceTest.java | 247 +++++++ ...urstGalaxyS3DestinationAcceptanceTest.java | 66 ++ .../resources/testdata/data-append.json | 7 + .../resources/testdata/data-overwrite.json | 6 + .../resources/testdata/dataV0.json | 18 + .../resources/testdata/dataV1.json | 12 + .../resources/testdata/datatypeV0.json | 62 ++ .../resources/testdata/datatypeV1.json | 35 + .../resources/testdata/expected-dataV0.json | 15 + .../resources/testdata/expected-dataV1.json | 12 + .../testdata/expected-datatypeV0.json | 18 + .../testdata/expected-datatypeV1.json | 15 + .../testdata/expected-schema-append.json | 8 + .../testdata/expected-schema-overwrite.json | 7 + .../resources/testdata/schema-append.json | 23 + .../resources/testdata/schema-overwrite.json | 20 + .../StarburstGalaxyDestinationConfigTest.java | 71 ++ ...tarburstGalaxyDestinationResolverTest.java | 56 ++ ...arburstGalaxyStagingStorageConfigTest.java | 36 + .../starburst_galaxy/TypeConversionTest.java | 68 ++ .../src/test/resources/config.json | 18 + .../type_conversion_test_cases_v0.json | 629 ++++++++++++++++++ .../type_conversion_test_cases_v1.json | 613 +++++++++++++++++ connectors.md | 1 + deps.toml | 2 +- .../destinations/starburst-galaxy.md | 101 +++ 57 files changed, 3841 insertions(+), 1 deletion(-) create mode 100644 airbyte-config-oss/init-oss/src/main/resources/icons/starburst-galaxy.svg create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/.dockerignore create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/BOOTSTRAP.md create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/Dockerfile create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/README.md create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/build.gradle create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/integration_tests/configured_catalog.json create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/sample_secrets/config.json create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/ColumnMetadata.java create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/HadoopCatalogIcebergS3ParquetWriter.java create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyBaseDestination.java create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyConstants.java create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyDestination.java create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyDestinationConfig.java create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyDestinationResolver.java create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyNameTransformer.java create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyS3Destination.java create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyS3StagingStorageConfig.java create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyS3StreamCopier.java create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyS3StreamCopierFactory.java create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxySqlOperations.java create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyStagingStorageConfig.java create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyStagingStorageType.java create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyStreamCopier.java create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyStreamCopierFactory.java create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/TableSchema.java create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/main/resources/spec.json create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyDestinationAcceptanceTest.java create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyS3DestinationAcceptanceTest.java create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/data-append.json create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/data-overwrite.json create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/dataV0.json create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/dataV1.json create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/datatypeV0.json create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/datatypeV1.json create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/expected-dataV0.json create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/expected-dataV1.json create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/expected-datatypeV0.json create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/expected-datatypeV1.json create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/expected-schema-append.json create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/expected-schema-overwrite.json create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/schema-append.json create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/schema-overwrite.json create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/test/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyDestinationConfigTest.java create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/test/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyDestinationResolverTest.java create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/test/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyStagingStorageConfigTest.java create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/test/java/io/airbyte/integrations/destination/starburst_galaxy/TypeConversionTest.java create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/test/resources/config.json create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/test/resources/schemas/type_conversion_test_cases_v0.json create mode 100644 airbyte-integrations/connectors/destination-starburst-galaxy/src/test/resources/schemas/type_conversion_test_cases_v1.json create mode 100644 docs/integrations/destinations/starburst-galaxy.md diff --git a/airbyte-config-oss/init-oss/src/main/resources/icons/starburst-galaxy.svg b/airbyte-config-oss/init-oss/src/main/resources/icons/starburst-galaxy.svg new file mode 100644 index 0000000000000..11eb26295c03e --- /dev/null +++ b/airbyte-config-oss/init-oss/src/main/resources/icons/starburst-galaxy.svg @@ -0,0 +1 @@ + \ No newline at end of file diff --git a/airbyte-config-oss/init-oss/src/main/resources/seed/destination_definitions.yaml b/airbyte-config-oss/init-oss/src/main/resources/seed/destination_definitions.yaml index 8eaffb3b2a314..ff2c8a310a86d 100644 --- a/airbyte-config-oss/init-oss/src/main/resources/seed/destination_definitions.yaml +++ b/airbyte-config-oss/init-oss/src/main/resources/seed/destination_definitions.yaml @@ -116,6 +116,13 @@ documentationUrl: https://docs.airbyte.io/integrations/destinations/convex icon: convex.svg releaseStage: alpha +- name: Starburst Galaxy + destinationDefinitionId: 4528e960-6f7b-4412-8555-7e0097e1da17 + dockerRepository: airbyte/destination-starburst-galaxy + dockerImageTag: 0.0.1 + documentationUrl: https://docs.airbyte.com/integrations/destinations/starburst-galaxy + icon: starburst-galaxy.svg + releaseStage: alpha - name: Databricks Lakehouse destinationDefinitionId: 072d5540-f236-4294-ba7c-ade8fd918496 dockerRepository: airbyte/destination-databricks diff --git a/airbyte-config-oss/init-oss/src/main/resources/seed/destination_specs.yaml b/airbyte-config-oss/init-oss/src/main/resources/seed/destination_specs.yaml index f88cd033c4928..d3469b8072e7e 100644 --- a/airbyte-config-oss/init-oss/src/main/resources/seed/destination_specs.yaml +++ b/airbyte-config-oss/init-oss/src/main/resources/seed/destination_specs.yaml @@ -1820,6 +1820,158 @@ - "overwrite" - "append" - "append_dedup" +- dockerImage: "airbyte/destination-starburst-galaxy:0.0.1" + spec: + documentationUrl: "https://docs.airbyte.com/integrations/destinations/starburst-galaxy" + connectionSpecification: + $schema: "http://json-schema.org/draft-07/schema#" + title: "Starburst Galaxy Destination Spec" + type: "object" + required: + - "accept_terms" + - "server_hostname" + - "username" + - "password" + - "catalog" + - "staging_object_store" + properties: + accept_terms: + title: "Agree to the Starburst Galaxy terms & conditions" + type: "boolean" + description: "You must agree to the Starburst Galaxy terms & conditions to use this connector." + default: false + order: 1 + server_hostname: + title: "Hostname" + type: "string" + description: "Starburst Galaxy cluster hostname." + examples: + - "abc-12345678-wxyz.trino.galaxy-demo.io" + order: 2 + port: + title: "Port" + type: "string" + description: "Starburst Galaxy cluster port." + default: "443" + examples: + - "443" + order: 3 + username: + title: "User" + type: "string" + description: "Starburst Galaxy user." + examples: + - "user@example.com" + order: 4 + password: + title: "Password" + type: "string" + description: "Starburst Galaxy password for the specified user." + examples: + - "password" + airbyte_secret: true + order: 5 + catalog: + title: "Amazon S3 catalog" + type: "string" + description: "Name of the Starburst Galaxy Amazon S3 catalog." + examples: + - "sample_s3_catalog" + order: 6 + catalog_schema: + title: "Amazon S3 catalog schema" + type: "string" + description: "The default Starburst Galaxy Amazon S3 catalog schema where\ + \ tables are written to if the source does not specify a namespace. Defaults\ + \ to \"public\"." + default: "public" + examples: + - "public" + order: 7 + staging_object_store: + title: "Staging object store" + type: "object" + description: "Temporary storage on which temporary Iceberg table is created." + oneOf: + - title: "Amazon S3" + required: + - "object_store_type" + - "s3_bucket_name" + - "s3_bucket_path" + - "s3_bucket_region" + - "s3_access_key_id" + - "s3_secret_access_key" + properties: + object_store_type: + type: "string" + enum: + - "S3" + default: "S3" + order: 1 + s3_bucket_name: + title: "S3 bucket name" + type: "string" + description: "Name of the S3 bucket" + examples: + - "airbyte_staging" + order: 1 + s3_bucket_path: + title: "S3 bucket path" + type: "string" + description: "Directory in the S3 bucket where staging data is stored." + examples: + - "temp_airbyte__sync/test" + order: 2 + s3_bucket_region: + title: "S3 bucket region" + type: "string" + default: "us-east-1" + description: "The region of the S3 bucket." + enum: + - "ap-northeast-1" + - "ap-southeast-1" + - "ap-southeast-2" + - "ca-central-1" + - "eu-central-1" + - "eu-west-1" + - "eu-west-2" + - "eu-west-3" + - "us-east-1" + - "us-east-2" + - "us-west-1" + - "us-west-2" + order: 3 + s3_access_key_id: + title: "Access key" + type: "string" + description: "Access key with access to the bucket. Airbyte requires\ + \ read and write permissions to a given bucket." + examples: + - "A012345678910EXAMPLE" + airbyte_secret: true + order: 4 + s3_secret_access_key: + title: "Secret key" + type: "string" + description: "Secret key used with the specified access key." + examples: + - "a012345678910ABCDEFGH/AbCdEfGhEXAMPLEKEY" + airbyte_secret: true + order: 5 + order: 8 + purge_staging_table: + title: "Purge staging Iceberg table" + type: "boolean" + description: "Defaults to 'true'. Switch to 'false' for debugging purposes." + default: true + order: 9 + supportsIncremental: true + supportsNormalization: false + supportsDBT: false + supported_destination_sync_modes: + - "overwrite" + - "append" - dockerImage: "airbyte/destination-databricks:1.0.2" spec: documentationUrl: "https://docs.airbyte.com/integrations/destinations/databricks" diff --git a/airbyte-config-oss/init-oss/src/main/resources/seed/oss_catalog.json b/airbyte-config-oss/init-oss/src/main/resources/seed/oss_catalog.json index 2375bb49fcb3f..b0b72c40f32ed 100644 --- a/airbyte-config-oss/init-oss/src/main/resources/seed/oss_catalog.json +++ b/airbyte-config-oss/init-oss/src/main/resources/seed/oss_catalog.json @@ -1694,6 +1694,147 @@ "public": true, "custom": false, "releaseStage": "alpha" + }, { + "destinationDefinitionId": "4528e960-6f7b-4412-8555-7e0097e1da17", + "name": "Starburst Galaxy", + "dockerRepository": "airbyte/destination-starburst-galaxy", + "dockerImageTag": "0.0.1", + "documentationUrl": "https://docs.airbyte.com/integrations/destinations/starburst-galaxy", + "icon": "starburst-galaxy.svg", + "spec": { + "documentationUrl": "https://docs.airbyte.com/integrations/destinations/starburst-galaxy", + "connectionSpecification": { + "$schema": "http://json-schema.org/draft-07/schema#", + "title": "Starburst Galaxy Destination Spec", + "type": "object", + "required": [ "accept_terms", "server_hostname", "username", "password", "catalog", "staging_object_store" ], + "properties": { + "accept_terms": { + "title": "Agree to the Starburst Galaxy terms & conditions", + "type": "boolean", + "description": "You must agree to the Starburst Galaxy terms & conditions to use this connector.", + "default": false, + "order": 1 + }, + "server_hostname": { + "title": "Hostname", + "type": "string", + "description": "Starburst Galaxy cluster hostname.", + "examples": [ "abc-12345678-wxyz.trino.galaxy-demo.io" ], + "order": 2 + }, + "port": { + "title": "Port", + "type": "string", + "description": "Starburst Galaxy cluster port.", + "default": "443", + "examples": [ "443" ], + "order": 3 + }, + "username": { + "title": "User", + "type": "string", + "description": "Starburst Galaxy user.", + "examples": [ "user@example.com" ], + "order": 4 + }, + "password": { + "title": "Password", + "type": "string", + "description": "Starburst Galaxy password for the specified user.", + "examples": [ "password" ], + "airbyte_secret": true, + "order": 5 + }, + "catalog": { + "title": "Amazon S3 catalog", + "type": "string", + "description": "Name of the Starburst Galaxy Amazon S3 catalog.", + "examples": [ "sample_s3_catalog" ], + "order": 6 + }, + "catalog_schema": { + "title": "Amazon S3 catalog schema", + "type": "string", + "description": "The default Starburst Galaxy Amazon S3 catalog schema where tables are written to if the source does not specify a namespace. Defaults to \"public\".", + "default": "public", + "examples": [ "public" ], + "order": 7 + }, + "staging_object_store": { + "title": "Staging object store", + "type": "object", + "description": "Temporary storage on which temporary Iceberg table is created.", + "oneOf": [ { + "title": "Amazon S3", + "required": [ "object_store_type", "s3_bucket_name", "s3_bucket_path", "s3_bucket_region", "s3_access_key_id", "s3_secret_access_key" ], + "properties": { + "object_store_type": { + "type": "string", + "enum": [ "S3" ], + "default": "S3", + "order": 1 + }, + "s3_bucket_name": { + "title": "S3 bucket name", + "type": "string", + "description": "Name of the S3 bucket", + "examples": [ "airbyte_staging" ], + "order": 1 + }, + "s3_bucket_path": { + "title": "S3 bucket path", + "type": "string", + "description": "Directory in the S3 bucket where staging data is stored.", + "examples": [ "temp_airbyte__sync/test" ], + "order": 2 + }, + "s3_bucket_region": { + "title": "S3 bucket region", + "type": "string", + "default": "us-east-1", + "description": "The region of the S3 bucket.", + "enum": [ "ap-northeast-1", "ap-southeast-1", "ap-southeast-2", "ca-central-1", "eu-central-1", "eu-west-1", "eu-west-2", "eu-west-3", "us-east-1", "us-east-2", "us-west-1", "us-west-2" ], + "order": 3 + }, + "s3_access_key_id": { + "title": "Access key", + "type": "string", + "description": "Access key with access to the bucket. Airbyte requires read and write permissions to a given bucket.", + "examples": [ "A012345678910EXAMPLE" ], + "airbyte_secret": true, + "order": 4 + }, + "s3_secret_access_key": { + "title": "Secret key", + "type": "string", + "description": "Secret key used with the specified access key.", + "examples": [ "a012345678910ABCDEFGH/AbCdEfGhEXAMPLEKEY" ], + "airbyte_secret": true, + "order": 5 + } + } + } ], + "order": 8 + }, + "purge_staging_table": { + "title": "Purge staging Iceberg table", + "type": "boolean", + "description": "Defaults to 'true'. Switch to 'false' for debugging purposes.", + "default": true, + "order": 9 + } + } + }, + "supportsIncremental": true, + "supportsNormalization": false, + "supportsDBT": false, + "supported_destination_sync_modes": [ "overwrite", "append" ] + }, + "tombstone": false, + "public": true, + "custom": false, + "releaseStage": "alpha" }, { "destinationDefinitionId": "072d5540-f236-4294-ba7c-ade8fd918496", "name": "Databricks Lakehouse", diff --git a/airbyte-db/db-lib/src/main/java/io/airbyte/db/factory/DatabaseDriver.java b/airbyte-db/db-lib/src/main/java/io/airbyte/db/factory/DatabaseDriver.java index ea920a4fe7009..4e23420186595 100644 --- a/airbyte-db/db-lib/src/main/java/io/airbyte/db/factory/DatabaseDriver.java +++ b/airbyte-db/db-lib/src/main/java/io/airbyte/db/factory/DatabaseDriver.java @@ -12,6 +12,7 @@ public enum DatabaseDriver { CLICKHOUSE("com.clickhouse.jdbc.ClickHouseDriver", "jdbc:clickhouse:%s://%s:%d/%s"), DATABRICKS("com.databricks.client.jdbc.Driver", "jdbc:databricks://%s:%s;HttpPath=%s;SSL=1;UserAgentEntry=Airbyte"), DB2("com.ibm.db2.jcc.DB2Driver", "jdbc:db2://%s:%d/%s"), + STARBURST("io.trino.jdbc.TrinoDriver", "jdbc:trino://%s:%s/%s?SSL=true&source=airbyte"), MARIADB("org.mariadb.jdbc.Driver", "jdbc:mariadb://%s:%d/%s"), MSSQLSERVER("com.microsoft.sqlserver.jdbc.SQLServerDriver", "jdbc:sqlserver://%s:%d/%s"), MYSQL("com.mysql.cj.jdbc.Driver", "jdbc:mysql://%s:%d/%s"), diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/.dockerignore b/airbyte-integrations/connectors/destination-starburst-galaxy/.dockerignore new file mode 100644 index 0000000000000..65c7d0ad3e73c --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/.dockerignore @@ -0,0 +1,3 @@ +* +!Dockerfile +!build diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/BOOTSTRAP.md b/airbyte-integrations/connectors/destination-starburst-galaxy/BOOTSTRAP.md new file mode 100644 index 0000000000000..8844bf5bde084 --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/BOOTSTRAP.md @@ -0,0 +1,8 @@ +# Starburst Galaxy destination connector bootstrap + +This destination syncs data to Amazon S3 catalog in [Starburst Galaxy](https://www.starburst.io/platform/starburst-galaxy/) by completing the following steps: + +1. Persist source stream data to S3 staging storage in the Iceberg table format. +2. Create a destination Iceberg table in Amazon S3 catalog in Starburst Galaxy from the staged Iceberg table. + +Learn more from [the Airbyte documentation](https://docs.airbyte.io/integrations/destinations/starburst-galaxy). diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/Dockerfile b/airbyte-integrations/connectors/destination-starburst-galaxy/Dockerfile new file mode 100644 index 0000000000000..bc9b3595224d7 --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/Dockerfile @@ -0,0 +1,18 @@ +FROM airbyte/integration-base-java:dev AS build + +WORKDIR /airbyte +ENV APPLICATION destination-starburst-galaxy + +COPY build/distributions/${APPLICATION}*.tar ${APPLICATION}.tar + +RUN tar xf ${APPLICATION}.tar --strip-components=1 && rm -rf ${APPLICATION}.tar + +FROM airbyte/integration-base-java:dev + +WORKDIR /airbyte +ENV APPLICATION destination-starburst-galaxy + +COPY --from=build /airbyte /airbyte + +LABEL io.airbyte.version=0.0.1 +LABEL io.airbyte.name=airbyte/destination-starburst-galaxy diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/README.md b/airbyte-integrations/connectors/destination-starburst-galaxy/README.md new file mode 100644 index 0000000000000..d8ec77b405d3c --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/README.md @@ -0,0 +1,65 @@ +# Build and run the Starburst Galaxy destination + +This is the repository for the Starburst Galaxy destination connector, written in Java. +For information about how to use this connector within Airbyte, see [the user documentation](https://docs.airbyte.com/integrations/destinations/starburst-galaxy). + +## Local development + +#### Build with Gradle + +From the Airbyte repository root, run: +``` +./gradlew :airbyte-integrations:connectors:destination-starburst-galaxy:build +``` + +#### Create credentials + +If you are a community contributor, you must generate the necessary credentials and place them in `secrets/config.json`, conforming to the spec file in `src/main/resources/spec.json`. +**Note**: The `secrets` directory is git-ignored by default; sensitive information cannot be checked in. + +If you are an Airbyte core member, you must follow the [instructions](https://docs.airbyte.com/connector-development#using-credentials-in-ci) to set up your credentials. + +### Build and run a local Docker image for the connector + +#### Build + +Build the connector image with Gradle: +``` +./gradlew :airbyte-integrations:connectors:destination-starburst-galaxy:airbyteDocker +``` +When building with Gradle, the Docker image name and tag, respectively, are the values of the `io.airbyte.name` and `io.airbyte.version` labels in +the Dockerfile. + +#### Run + +Following example commands are Starburst Galaxy-specific version of the [Airbyte protocol commands](https://docs.airbyte.com/understanding-airbyte/airbyte-protocol): +``` +docker run --rm airbyte/destination-starburst-galaxy:dev spec +docker run --rm -v $(pwd)/secrets:/secrets airbyte/destination-starburst-galaxy:dev check --config /secrets/config.json +docker run --rm -v $(pwd)/secrets:/secrets -v $(pwd)/integration_tests:/integration_tests airbyte/destination-starburst-galaxy:dev write --config /secrets/config.json --catalog /integration_tests/configured_catalog.json +``` + +### Run tests with Gradle + +All commands should be run from airbyte project root. + +To run unit tests: +``` +./gradlew :airbyte-integrations:connectors:destination-starburst-galaxy:unitTest +``` +To run acceptance and custom integration tests: +``` +./gradlew :airbyte-integrations:connectors:destination-starburst-galaxy:integrationTest +``` + +## Dependency management + +### Publish a new version of the connector + +After you have implemented a feature, bug fix or enhancement, you must do the following: + +1. Ensure all unit and integration tests pass. +2. Update the connector version by incrementing the value of the `io.airbyte.version` label in the Dockerfile by following the [SemVer](https://semver.org/) versioning rules. +3. Create a Pull Request. + +Airbyte will review your PR and request any changes necessary to merge it into master. \ No newline at end of file diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/build.gradle b/airbyte-integrations/connectors/destination-starburst-galaxy/build.gradle new file mode 100644 index 0000000000000..449997bb4e6ff --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/build.gradle @@ -0,0 +1,44 @@ +plugins { + id 'application' + id 'airbyte-docker' + id 'airbyte-integration-test-java' +} + +application { + mainClass = 'io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyDestination' +} + +dependencies { + implementation project(':airbyte-config-oss:config-models-oss') + implementation libs.airbyte.protocol + implementation project(':airbyte-integrations:bases:base-java') + implementation files(project(':airbyte-integrations:bases:base-java').airbyteDocker.outputs) + implementation project(':airbyte-integrations:bases:bases-destination-jdbc') + implementation project(path: ':airbyte-db:db-lib') + implementation project(path: ':airbyte-integrations:bases:base-java-s3') + implementation project(path: ':airbyte-integrations:connectors:destination-s3') + + implementation ('io.trino:trino-iceberg:411') {exclude group: 'commons-cli', module: 'commons-cli'} + implementation ('io.trino:trino-main:411') {exclude group: 'commons-cli', module: 'commons-cli'} + implementation ('io.trino:trino-jdbc:411') {exclude group: 'commons-cli', module: 'commons-cli'} + + implementation 'org.apache.avro:avro:1.11.1' + + implementation 'org.apache.iceberg:iceberg-core:1.1.0' + implementation 'org.apache.iceberg:iceberg-bundled-guava:1.1.0' + implementation 'org.apache.iceberg:iceberg-aws:1.1.0' + implementation 'org.apache.iceberg:iceberg-parquet:1.1.0' + + implementation 'org.apache.hadoop:hadoop-common:3.3.3' + implementation "org.apache.hadoop:hadoop-aws:3.3.2" + + implementation 'software.amazon.awssdk:bundle:2.20.20' + implementation 'software.amazon.awssdk:url-connection-client:2.20.20' + + implementation ('com.github.airbytehq:json-avro-converter:1.1.0') { exclude group: 'ch.qos.logback', module: 'logback-classic'} + + integrationTestJavaImplementation project(':airbyte-integrations:bases:standard-destination-test') + integrationTestJavaImplementation project(':airbyte-integrations:connectors:destination-starburst-galaxy') + + implementation ('org.apache.parquet:parquet-avro:1.12.3') { exclude group: 'org.slf4j', module: 'slf4j-log4j12'} +} diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/integration_tests/configured_catalog.json b/airbyte-integrations/connectors/destination-starburst-galaxy/integration_tests/configured_catalog.json new file mode 100644 index 0000000000000..da8be1ef3bd25 --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/integration_tests/configured_catalog.json @@ -0,0 +1,24 @@ +{ + "streams": [ + { + "stream" : { + "name": "users", + "json_schema": { + "type": "object", + "required": ["name"], + "properties": { + "name": { + "type": "string" + }, + "age": { + "type": "number" + } + } + }, + "supported_sync_modes": ["full_refresh"] + }, + "sync_mode": "full_refresh", + "destination_sync_mode": "overwrite" + } + ] +} \ No newline at end of file diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/sample_secrets/config.json b/airbyte-integrations/connectors/destination-starburst-galaxy/sample_secrets/config.json new file mode 100644 index 0000000000000..b07a907fec1fa --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/sample_secrets/config.json @@ -0,0 +1,18 @@ +{ + "accept_terms": true, + "server_hostname": "abc-12345678-wxyz.galaxy.starburst.io", + "port": "443", + "username": "user@example.com", + "password": "password", + "staging_object_store": { + "object_store_type": "S3", + "s3_bucket_name": "required", + "s3_bucket_path": "required", + "s3_bucket_region": "required", + "s3_access_key_id": "required", + "s3_secret_access_key": "required" + }, + "purge_staging_table": true, + "catalog": "s3_catalog", + "catalog_schema": "public" +} diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/ColumnMetadata.java b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/ColumnMetadata.java new file mode 100644 index 0000000000000..99c3189063cd2 --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/ColumnMetadata.java @@ -0,0 +1,9 @@ +/* + * Copyright (c) 2023 Airbyte, Inc., all rights reserved. + */ + +package io.airbyte.integrations.destination.starburst_galaxy; + +import io.trino.spi.type.Type; + +public record ColumnMetadata(String name, Type galaxyIcebergType, int position) {} diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/HadoopCatalogIcebergS3ParquetWriter.java b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/HadoopCatalogIcebergS3ParquetWriter.java new file mode 100644 index 0000000000000..fcc679572dfbf --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/HadoopCatalogIcebergS3ParquetWriter.java @@ -0,0 +1,179 @@ +/* + * Copyright (c) 2023 Airbyte, Inc., all rights reserved. + */ + +package io.airbyte.integrations.destination.starburst_galaxy; + +import static io.airbyte.integrations.destination.s3.writer.BaseS3Writer.determineOutputFilename; +import static org.apache.hadoop.fs.s3a.Constants.ACCESS_KEY; +import static org.apache.hadoop.fs.s3a.Constants.AWS_CREDENTIALS_PROVIDER; +import static org.apache.hadoop.fs.s3a.Constants.SECRET_KEY; +import static org.apache.hadoop.fs.s3a.Constants.SECURE_CONNECTIONS; +import static org.apache.iceberg.CatalogProperties.FILE_IO_IMPL; +import static org.apache.iceberg.CatalogProperties.WAREHOUSE_LOCATION; +import static org.apache.iceberg.aws.AwsProperties.S3FILEIO_ACCESS_KEY_ID; +import static org.apache.iceberg.aws.AwsProperties.S3FILEIO_SECRET_ACCESS_KEY; + +import com.amazonaws.services.s3.AmazonS3; +import io.airbyte.integrations.destination.s3.S3DestinationConfig; +import io.airbyte.integrations.destination.s3.S3Format; +import io.airbyte.integrations.destination.s3.credential.S3AccessKeyCredentialConfig; +import io.airbyte.integrations.destination.s3.template.S3FilenameTemplateParameterObject; +import io.airbyte.protocol.models.v0.AirbyteStream; +import io.airbyte.protocol.models.v0.ConfiguredAirbyteStream; +import java.io.IOException; +import java.sql.Timestamp; +import java.util.HashMap; +import java.util.Map; +import org.apache.avro.generic.GenericData; +import org.apache.avro.generic.GenericData.Record; +import org.apache.hadoop.conf.Configuration; +import org.apache.iceberg.PartitionSpec; +import org.apache.iceberg.Schema; +import org.apache.iceberg.Table; +import org.apache.iceberg.catalog.Namespace; +import org.apache.iceberg.catalog.TableIdentifier; +import org.apache.iceberg.hadoop.HadoopCatalog; +import org.apache.iceberg.io.DataWriter; +import org.apache.iceberg.parquet.Parquet; +import org.apache.iceberg.parquet.ParquetAvroWriter; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public class HadoopCatalogIcebergS3ParquetWriter { + + private static final Logger LOGGER = LoggerFactory.getLogger(HadoopCatalogIcebergS3ParquetWriter.class); + + private final DataWriter parquetWriter; + private final Table table; + private final S3DestinationConfig config; + private final AirbyteStream stream; + private final HadoopCatalog catalog; + private final AmazonS3 s3Client; + private final String tableStorageRelativePath; + + public HadoopCatalogIcebergS3ParquetWriter( + final S3DestinationConfig config, + final ConfiguredAirbyteStream configuredStream, + final Schema schema, + final String schemaName, + final String tableName, + final Timestamp uploadTime) + throws IOException { + + this.config = config; + this.stream = configuredStream.getStream(); + this.s3Client = config.getS3Client(); + + String outputFilename = determineOutputFilename(S3FilenameTemplateParameterObject + .builder() + .s3Format(S3Format.PARQUET) + .timestamp(uploadTime) + .fileExtension(S3Format.PARQUET.getFileExtension()) + .build()); + + String warehousePath = String.format("s3a://%s/%s", this.config.getBucketName(), this.config.getBucketPath()); + + this.tableStorageRelativePath = String.join("/", this.config.getBucketPath(), schemaName, tableName); + initializeS3Storage(); + + this.catalog = createCatalog(warehousePath); + LOGGER.info("Warehouse path {}", warehousePath); + Namespace namespace = Namespace.of(schemaName); + TableIdentifier name = TableIdentifier.of(namespace, tableName); + catalog.createTable(name, schema); + // Create table may change the column ids of given schema before committing to metadata file which + // brings inconsistencies between table schema and the schema used by parquetWriter. + // For sharing consistent schema between parquetWriter and a table, loadTable is used to get the + // updated schema which can be used by the parquetWriter + // https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/TableMetadata.java#L102-L105 + this.table = catalog.loadTable(name); + String tableLocation = table.location() + "/" + outputFilename; + LOGGER.info("Table {} at data file location {} is created", table.name(), tableLocation); + + this.parquetWriter = Parquet.writeData(table.io().newOutputFile(tableLocation)) + .schema(table.schema()) + .createWriterFunc(ParquetAvroWriter::buildWriter) + .overwrite() + .withSpec(PartitionSpec.unpartitioned()) + .build(); + } + + private void initializeS3Storage() { + try { + final String bucket = config.getBucketName(); + if (!s3Client.doesBucketExistV2(bucket)) { + LOGGER.info("Bucket {} does not exist; creating...", bucket); + s3Client.createBucket(bucket); + LOGGER.info("Bucket {} has been created.", bucket); + } + } catch (Exception e) { + LOGGER.error("Failed to initialize S3 storage: ", e); + throw e; + } + } + + public String getTableStorageRelativePath() { + return tableStorageRelativePath; + } + + public Table getTable() { + return table; + } + + public void write(GenericData.Record record) { + parquetWriter.write(record); + } + + private void closeWhenSucceed() throws IOException { + parquetWriter.close(); + } + + private void closeWhenFail() throws IOException { + parquetWriter.close(); + } + + public void close(final boolean hasFailed) + throws IOException { + try { + if (hasFailed) { + LOGGER.warn("Failure detected. Aborting upload of stream '{}'...", stream.getName()); + closeWhenFail(); + LOGGER.warn("Upload of stream '{}' aborted.", stream.getName()); + } else { + LOGGER.info("Uploading remaining data for stream '{}'.", stream.getName()); + closeWhenSucceed(); + LOGGER.info("Upload completed for stream '{}'.", stream.getName()); + } + } finally { + table.newAppend().appendFile(parquetWriter.toDataFile()).commit(); + catalog.close(); + } + } + + private HadoopCatalog createCatalog(String warehousePath) { + S3AccessKeyCredentialConfig credentialConfig = (S3AccessKeyCredentialConfig) config.getS3CredentialConfig(); + + System.setProperty("aws.region", config.getBucketRegion()); + + Map properties = new HashMap<>(); + properties.put(WAREHOUSE_LOCATION, warehousePath); + properties.put(FILE_IO_IMPL, "org.apache.iceberg.aws.s3.S3FileIO"); + properties.put(S3FILEIO_ACCESS_KEY_ID, credentialConfig.getAccessKeyId()); + properties.put(S3FILEIO_SECRET_ACCESS_KEY, credentialConfig.getSecretAccessKey()); + + Configuration configuration = new Configuration(); + configuration.set(AWS_CREDENTIALS_PROVIDER, "org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider"); + configuration.set(ACCESS_KEY, credentialConfig.getAccessKeyId()); + configuration.set(SECRET_KEY, credentialConfig.getSecretAccessKey()); + configuration.set(SECURE_CONNECTIONS, "true"); + + HadoopCatalog hadoopCatalog = new HadoopCatalog(); + hadoopCatalog.setConf(configuration); + + hadoopCatalog.initialize("hadoop-catalog", properties); + + return hadoopCatalog; + } + +} diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyBaseDestination.java b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyBaseDestination.java new file mode 100644 index 0000000000000..d0456364ee0f3 --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyBaseDestination.java @@ -0,0 +1,93 @@ +/* + * Copyright (c) 2023 Airbyte, Inc., all rights reserved. + */ + +package io.airbyte.integrations.destination.starburst_galaxy; + +import static io.airbyte.db.factory.DatabaseDriver.STARBURST; +import static io.airbyte.integrations.destination.jdbc.copy.CopyConsumerFactory.create; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyConstants.CATALOG_SCHEMA; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyConstants.STARBURST_GALAXY_DRIVER_CLASS; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyDestinationConfig.get; +import static java.lang.String.format; + +import com.fasterxml.jackson.databind.JsonNode; +import io.airbyte.db.factory.DataSourceFactory; +import io.airbyte.db.jdbc.DefaultJdbcDatabase; +import io.airbyte.db.jdbc.JdbcDatabase; +import io.airbyte.integrations.base.AirbyteMessageConsumer; +import io.airbyte.integrations.destination.StandardNameTransformer; +import io.airbyte.integrations.destination.jdbc.SqlOperations; +import io.airbyte.integrations.destination.jdbc.copy.CopyDestination; +import io.airbyte.protocol.models.v0.AirbyteMessage; +import io.airbyte.protocol.models.v0.ConfiguredAirbyteCatalog; +import java.util.function.Consumer; +import javax.sql.DataSource; + +public abstract class StarburstGalaxyBaseDestination + extends CopyDestination { + + public StarburstGalaxyBaseDestination() { + super(CATALOG_SCHEMA); + } + + @Override + public void checkPersistence(JsonNode config) { + checkPersistence(get(config).storageConfig()); + } + + protected abstract void checkPersistence(StarburstGalaxyStagingStorageConfig galaxyStorageConfig); + + @Override + public AirbyteMessageConsumer getConsumer(final JsonNode config, + final ConfiguredAirbyteCatalog catalog, + final Consumer outputRecordCollector) { + final StarburstGalaxyDestinationConfig starburstGalaxyConfig = get(config); + final DataSource dataSource = getDataSource(config); + return create( + outputRecordCollector, + dataSource, + getDatabase(dataSource), + getSqlOperations(), + getNameTransformer(), + starburstGalaxyConfig, + catalog, + getStreamCopierFactory(), + starburstGalaxyConfig.galaxyCatalogSchema()); + } + + protected abstract StarburstGalaxyStreamCopierFactory getStreamCopierFactory(); + + @Override + public StandardNameTransformer getNameTransformer() { + return new StarburstGalaxyNameTransformer(); + } + + @Override + public DataSource getDataSource(final JsonNode config) { + final StarburstGalaxyDestinationConfig galaxyDestinationConfig = get(config); + return DataSourceFactory.create( + galaxyDestinationConfig.galaxyUsername(), + galaxyDestinationConfig.galaxyPassword(), + STARBURST_GALAXY_DRIVER_CLASS, + getGalaxyConnectionString(galaxyDestinationConfig)); + } + + @Override + public JdbcDatabase getDatabase(final DataSource dataSource) { + return new DefaultJdbcDatabase(dataSource); + } + + @Override + public SqlOperations getSqlOperations() { + return new StarburstGalaxySqlOperations(); + } + + public static String getGalaxyConnectionString(final StarburstGalaxyDestinationConfig galaxyDestinationConfig) { + return format(STARBURST.getUrlFormatString(), + galaxyDestinationConfig.galaxyServerHostname(), + galaxyDestinationConfig.galaxyPort(), + galaxyDestinationConfig.galaxyCatalog()); + } + +} diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyConstants.java b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyConstants.java new file mode 100644 index 0000000000000..5e4d77c18f599 --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyConstants.java @@ -0,0 +1,25 @@ +/* + * Copyright (c) 2023 Airbyte, Inc., all rights reserved. + */ + +package io.airbyte.integrations.destination.starburst_galaxy; + +import static io.airbyte.db.factory.DatabaseDriver.STARBURST; + +public final class StarburstGalaxyConstants { + + public static final String STARBURST_GALAXY_DRIVER_CLASS = STARBURST.getDriverClassName(); + public static final String ACCEPT_TERMS = "accept_terms"; + public static final String SERVER_HOSTNAME = "server_hostname"; + public static final String PORT = "port"; + public static final String USERNAME = "username"; + public static final String PASSWORD = "password"; + public static final String CATALOG = "catalog"; + public static final String CATALOG_SCHEMA = "catalog_schema"; + public static final String OBJECT_STORE_TYPE = "object_store_type"; + public static final String PURGE_STAGING_TABLE = "purge_staging_table"; + public static final String STAGING_OBJECT_STORE = "staging_object_store"; + + private StarburstGalaxyConstants() {} + +} diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyDestination.java b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyDestination.java new file mode 100644 index 0000000000000..63a65e08c6b8a --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyDestination.java @@ -0,0 +1,30 @@ +/* + * Copyright (c) 2023 Airbyte, Inc., all rights reserved. + */ + +package io.airbyte.integrations.destination.starburst_galaxy; + +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyStagingStorageType.S3; + +import com.google.common.collect.ImmutableMap; +import io.airbyte.integrations.base.Destination; +import io.airbyte.integrations.base.IntegrationRunner; +import io.airbyte.integrations.destination.jdbc.copy.SwitchingDestination; +import java.io.Closeable; +import java.sql.DriverManager; + +public class StarburstGalaxyDestination extends SwitchingDestination { + + public StarburstGalaxyDestination() { + super(StarburstGalaxyStagingStorageType.class, + StarburstGalaxyDestinationResolver::getStagingStorageType, + ImmutableMap.of(S3, new StarburstGalaxyS3Destination())); + } + + public static void main(final String[] args) throws Exception { + final Destination destination = new StarburstGalaxyDestination(); + new IntegrationRunner(destination).run(args); + ((Closeable) DriverManager.getDriver("jdbc:trino:")).close(); + } + +} diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyDestinationConfig.java b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyDestinationConfig.java new file mode 100644 index 0000000000000..78fa900b95212 --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyDestinationConfig.java @@ -0,0 +1,49 @@ +/* + * Copyright (c) 2023 Airbyte, Inc., all rights reserved. + */ + +package io.airbyte.integrations.destination.starburst_galaxy; + +import static com.google.common.base.Preconditions.checkArgument; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyConstants.ACCEPT_TERMS; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyConstants.CATALOG; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyConstants.CATALOG_SCHEMA; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyConstants.PASSWORD; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyConstants.PORT; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyConstants.PURGE_STAGING_TABLE; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyConstants.SERVER_HOSTNAME; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyConstants.STAGING_OBJECT_STORE; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyConstants.USERNAME; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyStagingStorageConfig.getStarburstGalaxyStagingStorageConfig; + +import com.fasterxml.jackson.databind.JsonNode; + +public record StarburstGalaxyDestinationConfig(String galaxyServerHostname, + String galaxyPort, + String galaxyUsername, + String galaxyPassword, + String galaxyCatalog, + String galaxyCatalogSchema, + boolean purgeStagingData, + StarburstGalaxyStagingStorageConfig storageConfig) { + + static final String DEFAULT_STARBURST_GALAXY_PORT = "443"; + static final String DEFAULT_STARBURST_GALAXY_CATALOG_SCHEMA = "public"; + static final boolean DEFAULT_PURGE_STAGING_TABLE = true; + + public static StarburstGalaxyDestinationConfig get(final JsonNode config) { + checkArgument( + config.has(ACCEPT_TERMS) && config.get(ACCEPT_TERMS).asBoolean(), + "You must agree to the Starburst Galaxy Terms & Conditions to use this connector."); + return new StarburstGalaxyDestinationConfig( + config.get(SERVER_HOSTNAME).asText(), + config.has(PORT) ? config.get(PORT).asText() : DEFAULT_STARBURST_GALAXY_PORT, + config.get(USERNAME).asText(), + config.get(PASSWORD).asText(), + config.get(CATALOG).asText(), + config.has(CATALOG_SCHEMA) ? config.get(CATALOG_SCHEMA).asText() : DEFAULT_STARBURST_GALAXY_CATALOG_SCHEMA, + config.has(PURGE_STAGING_TABLE) ? config.get(PURGE_STAGING_TABLE).asBoolean() : DEFAULT_PURGE_STAGING_TABLE, + getStarburstGalaxyStagingStorageConfig(config.get(STAGING_OBJECT_STORE))); + } + +} diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyDestinationResolver.java b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyDestinationResolver.java new file mode 100644 index 0000000000000..9d823ec0eea99 --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyDestinationResolver.java @@ -0,0 +1,26 @@ +/* + * Copyright (c) 2023 Airbyte, Inc., all rights reserved. + */ + +package io.airbyte.integrations.destination.starburst_galaxy; + +import static io.airbyte.integrations.destination.s3.constant.S3Constants.S_3_BUCKET_NAME; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyConstants.STAGING_OBJECT_STORE; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyStagingStorageType.S3; + +import com.fasterxml.jackson.databind.JsonNode; + +public class StarburstGalaxyDestinationResolver { + + public static StarburstGalaxyStagingStorageType getStagingStorageType(final JsonNode config) { + if (isS3StagingStore(config)) { + return S3; + } + throw new IllegalArgumentException("Staging storage configurations must be provided"); + } + + public static boolean isS3StagingStore(final JsonNode config) { + return config.has(STAGING_OBJECT_STORE) && config.get(STAGING_OBJECT_STORE).isObject() && config.get(STAGING_OBJECT_STORE).has(S_3_BUCKET_NAME); + } + +} diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyNameTransformer.java b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyNameTransformer.java new file mode 100644 index 0000000000000..1337156046b39 --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyNameTransformer.java @@ -0,0 +1,39 @@ +/* + * Copyright (c) 2023 Airbyte, Inc., all rights reserved. + */ + +package io.airbyte.integrations.destination.starburst_galaxy; + +import static java.util.Locale.ENGLISH; + +import io.airbyte.integrations.destination.StandardNameTransformer; + +public class StarburstGalaxyNameTransformer + extends StandardNameTransformer { + + @Override + public String convertStreamName(final String input) { + return applyDefaultCase(super.convertStreamName(input)); + } + + @Override + public String getIdentifier(final String name) { + return applyDefaultCase(super.getIdentifier(name)); + } + + @Override + public String getTmpTableName(final String streamName) { + return applyDefaultCase(super.getTmpTableName(streamName)); + } + + @Override + public String getRawTableName(final String streamName) { + return applyDefaultCase(super.getRawTableName(streamName)); + } + + @Override + public String applyDefaultCase(final String input) { + return input.toLowerCase(ENGLISH); + } + +} diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyS3Destination.java b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyS3Destination.java new file mode 100644 index 0000000000000..d6f0b535e43d8 --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyS3Destination.java @@ -0,0 +1,26 @@ +/* + * Copyright (c) 2023 Airbyte, Inc., all rights reserved. + */ + +package io.airbyte.integrations.destination.starburst_galaxy; + +import static io.airbyte.integrations.destination.s3.S3BaseChecks.attemptS3WriteAndDelete; + +import io.airbyte.integrations.destination.s3.S3DestinationConfig; +import io.airbyte.integrations.destination.s3.S3StorageOperations; + +public class StarburstGalaxyS3Destination + extends StarburstGalaxyBaseDestination { + + @Override + protected void checkPersistence(StarburstGalaxyStagingStorageConfig galaxyStorageConfig) { + S3DestinationConfig s3Config = galaxyStorageConfig.getS3DestinationConfigOrThrow(); + attemptS3WriteAndDelete(new S3StorageOperations(getNameTransformer(), s3Config.getS3Client(), s3Config), s3Config, ""); + } + + @Override + protected StarburstGalaxyStreamCopierFactory getStreamCopierFactory() { + return new StarburstGalaxyS3StreamCopierFactory(); + } + +} diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyS3StagingStorageConfig.java b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyS3StagingStorageConfig.java new file mode 100644 index 0000000000000..cb338ef3bc691 --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyS3StagingStorageConfig.java @@ -0,0 +1,40 @@ +/* + * Copyright (c) 2023 Airbyte, Inc., all rights reserved. + */ + +package io.airbyte.integrations.destination.starburst_galaxy; + +import static io.airbyte.integrations.destination.s3.constant.S3Constants.S_3_ACCESS_KEY_ID; +import static io.airbyte.integrations.destination.s3.constant.S3Constants.S_3_BUCKET_NAME; +import static io.airbyte.integrations.destination.s3.constant.S3Constants.S_3_BUCKET_PATH; +import static io.airbyte.integrations.destination.s3.constant.S3Constants.S_3_BUCKET_REGION; +import static io.airbyte.integrations.destination.s3.constant.S3Constants.S_3_SECRET_ACCESS_KEY; + +import com.fasterxml.jackson.databind.JsonNode; +import com.fasterxml.jackson.databind.ObjectMapper; +import io.airbyte.integrations.destination.s3.S3DestinationConfig; +import io.airbyte.integrations.destination.s3.parquet.S3ParquetFormatConfig; + +public class StarburstGalaxyS3StagingStorageConfig + extends StarburstGalaxyStagingStorageConfig { + + private final S3DestinationConfig s3Config; + + public StarburstGalaxyS3StagingStorageConfig(JsonNode config) { + final S3DestinationConfig.Builder builder = S3DestinationConfig.create( + config.get(S_3_BUCKET_NAME).asText(), + config.get(S_3_BUCKET_PATH).asText(), + config.get(S_3_BUCKET_REGION).asText()) + .withAccessKeyCredential( + config.get(S_3_ACCESS_KEY_ID).asText(), + config.get(S_3_SECRET_ACCESS_KEY).asText()) + .withFormatConfig(new S3ParquetFormatConfig(new ObjectMapper().createObjectNode())); + this.s3Config = builder.get(); + } + + @Override + public S3DestinationConfig getS3DestinationConfigOrThrow() { + return s3Config; + } + +} diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyS3StreamCopier.java b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyS3StreamCopier.java new file mode 100644 index 0000000000000..04742225219e6 --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyS3StreamCopier.java @@ -0,0 +1,174 @@ +/* + * Copyright (c) 2023 Airbyte, Inc., all rights reserved. + */ + +package io.airbyte.integrations.destination.starburst_galaxy; + +import static java.lang.String.format; +import static java.nio.charset.StandardCharsets.UTF_8; +import static java.util.UUID.randomUUID; +import static java.util.stream.Collectors.joining; +import static org.apache.iceberg.hadoop.Util.VERSION_HINT_FILENAME; +import static org.slf4j.LoggerFactory.getLogger; + +import com.amazonaws.services.s3.AmazonS3; +import com.amazonaws.services.s3.model.GetObjectRequest; +import com.amazonaws.services.s3.model.S3Object; +import com.fasterxml.jackson.databind.JsonNode; +import io.airbyte.db.jdbc.JdbcDatabase; +import io.airbyte.integrations.destination.StandardNameTransformer; +import io.airbyte.integrations.destination.jdbc.SqlOperations; +import io.airbyte.integrations.destination.jdbc.copy.StreamCopier; +import io.airbyte.integrations.destination.s3.S3DestinationConfig; +import io.airbyte.integrations.destination.s3.avro.AvroConstants; +import io.airbyte.integrations.destination.s3.avro.AvroRecordFactory; +import io.airbyte.integrations.destination.s3.avro.JsonToAvroSchemaConverter; +import io.airbyte.protocol.models.v0.AirbyteRecordMessage; +import io.airbyte.protocol.models.v0.ConfiguredAirbyteStream; +import java.io.IOException; +import java.sql.Timestamp; +import java.util.UUID; +import org.apache.avro.Schema; +import org.apache.iceberg.parquet.ParquetSchemaUtil; +import org.apache.parquet.avro.AvroSchemaConverter; +import org.apache.parquet.schema.MessageType; +import org.slf4j.Logger; + +/** + * This implementation is similar to {@link StreamCopier}. The difference is that this + * implementation creates Parquet staging file(s), instead of CSV ones. + *
    + *
  • 1. Parquet writer writes data stream into tmp Iceberg table in + * s3://bucket-name/bucket-path/namespace/schema/temp-Iceberg-table-name.
  • + *
  • 2. Creates(or modifies the schema of) the destination Iceberg table from the tmp Iceberg + * table schema in Galaxy Amazon S3 Catalog based on the destination sync mode
  • + *
  • 3. Copies the tmp Iceberg table data into the destination Iceberg table in Amazon S3 Galaxy + * Catalog.
  • + *
  • 5. Deletes the tmp Iceberg table.
  • + *
+ */ +public class StarburstGalaxyS3StreamCopier + extends StarburstGalaxyStreamCopier { + + private static final Logger LOGGER = getLogger(StarburstGalaxyS3StreamCopier.class); + private final AmazonS3 s3Client; + private final S3DestinationConfig s3Config; + private final HadoopCatalogIcebergS3ParquetWriter icebergWriter; + private final AvroRecordFactory avroRecordFactory; + + public StarburstGalaxyS3StreamCopier(final String stagingFolder, + final String schema, + final ConfiguredAirbyteStream configuredStream, + final AmazonS3 s3Client, + final JdbcDatabase database, + final StarburstGalaxyDestinationConfig galaxyDestinationConfig, + final StandardNameTransformer nameTransformer, + final SqlOperations sqlOperations, + final Timestamp uploadTime) + throws Exception { + super(stagingFolder, schema, configuredStream, database, galaxyDestinationConfig, nameTransformer, sqlOperations); + this.s3Client = s3Client; + this.s3Config = galaxyDestinationConfig.storageConfig().getS3DestinationConfigOrThrow(); + Schema avroSchema = getAvroSchema(configuredStream.getStream().getName(), + configuredStream.getStream().getNamespace(), configuredStream.getStream().getJsonSchema()); + org.apache.iceberg.Schema icebergSchema = getIcebergSchema(avroSchema); + this.icebergWriter = new HadoopCatalogIcebergS3ParquetWriter( + galaxyDestinationConfig.storageConfig().getS3DestinationConfigOrThrow(), configuredStream, icebergSchema, + this.schemaName, this.tmpTableName, uploadTime); + this.avroRecordFactory = new AvroRecordFactory(avroSchema, AvroConstants.JSON_CONVERTER); + LOGGER.info("[Stream {}] Tmp table {} location: {}", streamName, tmpTableName, getTmpTableLocation()); + LOGGER.info("[Stream {}] Iceberg schema: {}", streamName, icebergSchema); + this.galaxySchema = convertIcebergSchemaToGalaxySchema(icebergSchema); + } + + static org.apache.iceberg.Schema getIcebergSchema(Schema avroSchema) { + MessageType parquetSchema = new AvroSchemaConverter().convert(avroSchema); + return ParquetSchemaUtil.convert(parquetSchema); + } + + static Schema getAvroSchema(String streamName, String namespace, JsonNode jsonSchema) { + final JsonToAvroSchemaConverter schemaConverter = new JsonToAvroSchemaConverter(); + return schemaConverter.getAvroSchema(jsonSchema, streamName, namespace, true, true, false, true); + } + + @Override + public String prepareStagingFile() { + return String.join("/", s3Config.getBucketPath(), stagingFolder); + } + + @Override + public void write(final UUID id, final AirbyteRecordMessage recordMessage, final String fileName) throws Exception { + recordMessage.setEmittedAt(recordMessage.getEmittedAt() * 1000); // Corresponding Galaxy type expects micro precision. + icebergWriter.write(avroRecordFactory.getAvroRecord(id, recordMessage)); + } + + @Override + public void closeStagingUploader(final boolean hasFailed) throws Exception { + icebergWriter.close(hasFailed); + } + + @Override + protected String getTmpTableLocation() { + // Galaxy location privilege doesn't allow path starting with s3a + String tmpTableLocation = icebergWriter.getTable().location().replace("s3a://", "s3://"); + LOGGER.info("[Stream {}] Tmp table location: {}", streamName, tmpTableLocation); + return tmpTableLocation; + } + + @Override + protected String getTmpTableMetadataFileName() + throws IOException { + String tmpTableBasePath = icebergWriter.getTableStorageRelativePath(); + LOGGER.info("[Stream {}] Tmp table base path: {}", streamName, tmpTableBasePath); + GetObjectRequest getObjectRequest = new GetObjectRequest(s3Config.getBucketName(), + tmpTableBasePath + "/metadata/" + VERSION_HINT_FILENAME); + S3Object object = s3Client.getObject(getObjectRequest); + String currentMetadataFileVersion = new String(object.getObjectContent().readAllBytes(), UTF_8).strip(); + LOGGER.info("[Stream {}] Current metadata file version {}", streamName, currentMetadataFileVersion); + String metadataJsonFile = "v" + currentMetadataFileVersion + ".metadata.json"; + String newMetadataJsonFileName = + "0".repeat(5 - currentMetadataFileVersion.length()) + currentMetadataFileVersion + "-" + randomUUID() + ".metadata.json"; + + // https://iceberg.apache.org/spec/#file-system-tables and + // https://iceberg.apache.org/spec/#metastore-tables follows different metadata file naming + // convention. Galaxy expect the version metadata file to always follow + // https://iceberg.apache.org/spec/#metastore-tables convention. + // Rename(copy) the metadata file name to follow Galaxy metadata file naming standards. + s3Client.copyObject( + s3Config.getBucketName(), tmpTableBasePath + "/metadata/" + metadataJsonFile, + s3Config.getBucketName(), tmpTableBasePath + "/metadata/" + newMetadataJsonFileName); + + LOGGER.info("New metadata file: {}/{}/{}", tmpTableBasePath, "metadata", newMetadataJsonFileName); + return newMetadataJsonFileName; + } + + @Override + public String generateMergeStatement(final String destTableName) { + String fields = String.join( + ", ", + galaxySchema.columns().stream() + .map(ColumnMetadata::name) + .collect(joining(", "))); + String insertData = format( + "INSERT INTO %s.%s(%s) SELECT %s FROM %s.%s", + quotedSchemaName, + destTableName, + fields, + fields, + quotedSchemaName, + tmpTableName); + LOGGER.info("[Stream {}] Insert source data into target: {}", streamName, insertData); + return insertData; + } + + @Override + public void closeNonCurrentStagingFileWriters() throws Exception { + icebergWriter.close(false); + } + + @Override + public String getCurrentFile() { + return ""; + } + +} diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyS3StreamCopierFactory.java b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyS3StreamCopierFactory.java new file mode 100644 index 0000000000000..542a04381a83f --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyS3StreamCopierFactory.java @@ -0,0 +1,44 @@ +/* + * Copyright (c) 2023 Airbyte, Inc., all rights reserved. + */ + +package io.airbyte.integrations.destination.starburst_galaxy; + +import static io.airbyte.integrations.destination.jdbc.copy.StreamCopierFactory.getSchema; + +import com.amazonaws.services.s3.AmazonS3; +import io.airbyte.db.jdbc.JdbcDatabase; +import io.airbyte.integrations.destination.StandardNameTransformer; +import io.airbyte.integrations.destination.jdbc.SqlOperations; +import io.airbyte.integrations.destination.jdbc.copy.StreamCopier; +import io.airbyte.integrations.destination.s3.S3DestinationConfig; +import io.airbyte.protocol.models.v0.AirbyteStream; +import io.airbyte.protocol.models.v0.ConfiguredAirbyteStream; +import java.sql.Timestamp; + +public class StarburstGalaxyS3StreamCopierFactory + implements StarburstGalaxyStreamCopierFactory { + + @Override + public StreamCopier create(final String configuredSchema, + final StarburstGalaxyDestinationConfig starburstGalaxyConfig, + final String stagingFolder, + final ConfiguredAirbyteStream configuredStream, + final StandardNameTransformer nameTransformer, + final JdbcDatabase database, + final SqlOperations sqlOperations) { + try { + final AirbyteStream stream = configuredStream.getStream(); + final String schema = getSchema(stream.getNamespace(), configuredSchema, nameTransformer); + + S3DestinationConfig s3Config = starburstGalaxyConfig.storageConfig().getS3DestinationConfigOrThrow(); + final AmazonS3 s3Client = s3Config.getS3Client(); + final Timestamp uploadTimestamp = new Timestamp(System.currentTimeMillis()); + return new StarburstGalaxyS3StreamCopier(stagingFolder, schema, configuredStream, s3Client, database, + starburstGalaxyConfig, nameTransformer, sqlOperations, uploadTimestamp); + } catch (final Exception e) { + throw new RuntimeException(e); + } + } + +} diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxySqlOperations.java b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxySqlOperations.java new file mode 100644 index 0000000000000..c43ae602f6396 --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxySqlOperations.java @@ -0,0 +1,60 @@ +/* + * Copyright (c) 2023 Airbyte, Inc., all rights reserved. + */ + +package io.airbyte.integrations.destination.starburst_galaxy; + +import static java.lang.String.format; + +import io.airbyte.db.jdbc.JdbcDatabase; +import io.airbyte.integrations.base.JavaBaseConstants; +import io.airbyte.integrations.destination.jdbc.JdbcSqlOperations; +import io.airbyte.protocol.models.v0.AirbyteRecordMessage; +import java.sql.SQLException; +import java.util.List; + +public class StarburstGalaxySqlOperations + extends JdbcSqlOperations { + + @Override + public void executeTransaction(final JdbcDatabase database, final List queries) throws Exception { + for (final String query : queries) { + database.execute(query); + } + } + + @Override + public String createTableQuery(final JdbcDatabase database, final String schemaName, final String tableName) { + String createTable = format( + "CREATE TABLE IF NOT EXISTS %s.%s (%s VARCHAR, %s VARCHAR, %s TIMESTAMP(6)) WITH (format = 'PARQUET', type = 'ICEBERG')", + schemaName, + tableName, + JavaBaseConstants.COLUMN_NAME_AB_ID, + JavaBaseConstants.COLUMN_NAME_DATA, + JavaBaseConstants.COLUMN_NAME_EMITTED_AT); + LOGGER.info("Create table: {}", createTable); + return createTable; + } + + @Override + public void createSchemaIfNotExists(final JdbcDatabase database, final String schemaName) throws Exception { + String createSchema = format("CREATE SCHEMA IF NOT EXISTS %s", schemaName); + LOGGER.info("Create schema if not exists: {}", createSchema); + database.execute(createSchema); + } + + @Override + public void insertRecordsInternal(final JdbcDatabase database, + final List records, + final String schemaName, + final String tmpTableName) { + // Do nothing. The records are copied into the table directly from the staging parquet file. + // So no manual insertion is needed. + } + + @Override + public void dropTableIfExists(final JdbcDatabase database, final String schemaName, final String tableName) throws SQLException { + database.execute(format("DROP TABLE IF EXISTS %s.%s", schemaName, tableName)); + } + +} diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyStagingStorageConfig.java b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyStagingStorageConfig.java new file mode 100644 index 0000000000000..9dbeafad4c774 --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyStagingStorageConfig.java @@ -0,0 +1,33 @@ +/* + * Copyright (c) 2023 Airbyte, Inc., all rights reserved. + */ + +package io.airbyte.integrations.destination.starburst_galaxy; + +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyConstants.OBJECT_STORE_TYPE; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyStagingStorageType.S3; +import static org.slf4j.LoggerFactory.getLogger; + +import com.fasterxml.jackson.databind.JsonNode; +import io.airbyte.integrations.destination.s3.S3DestinationConfig; +import org.slf4j.Logger; + +public abstract class StarburstGalaxyStagingStorageConfig { + + private static final Logger LOGGER = getLogger(StarburstGalaxyStagingStorageConfig.class); + + public static StarburstGalaxyStagingStorageConfig getStarburstGalaxyStagingStorageConfig(final JsonNode config) { + final JsonNode typeConfig = config.get(OBJECT_STORE_TYPE); + LOGGER.info("Galaxy staging storage type config: {}", typeConfig.toString()); + final StarburstGalaxyStagingStorageType storageType = StarburstGalaxyStagingStorageType.valueOf(typeConfig.asText().toUpperCase()); + if (storageType == S3) { + return new StarburstGalaxyS3StagingStorageConfig(config); + } + throw new RuntimeException("Unsupported staging object store type: " + storageType); + } + + public S3DestinationConfig getS3DestinationConfigOrThrow() { + throw new UnsupportedOperationException("Cannot get S3 destination config from " + this.getClass().getSimpleName()); + } + +} diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyStagingStorageType.java b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyStagingStorageType.java new file mode 100644 index 0000000000000..d30ad4c94fdaa --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyStagingStorageType.java @@ -0,0 +1,9 @@ +/* + * Copyright (c) 2023 Airbyte, Inc., all rights reserved. + */ + +package io.airbyte.integrations.destination.starburst_galaxy; + +public enum StarburstGalaxyStagingStorageType { + S3 +} diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyStreamCopier.java b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyStreamCopier.java new file mode 100644 index 0000000000000..b622ec8cfd7cf --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyStreamCopier.java @@ -0,0 +1,187 @@ +/* + * Copyright (c) 2023 Airbyte, Inc., all rights reserved. + */ + +package io.airbyte.integrations.destination.starburst_galaxy; + +import static io.airbyte.protocol.models.v0.DestinationSyncMode.APPEND; +import static io.airbyte.protocol.models.v0.DestinationSyncMode.OVERWRITE; +import static io.trino.plugin.iceberg.TypeConverter.toTrinoType; +import static io.trino.type.InternalTypeManager.TESTING_TYPE_MANAGER; +import static java.lang.String.format; +import static java.util.Locale.ENGLISH; + +import com.fasterxml.jackson.databind.JsonNode; +import io.airbyte.db.jdbc.JdbcDatabase; +import io.airbyte.integrations.destination.StandardNameTransformer; +import io.airbyte.integrations.destination.jdbc.SqlOperations; +import io.airbyte.integrations.destination.jdbc.copy.StreamCopier; +import io.airbyte.protocol.models.v0.ConfiguredAirbyteStream; +import io.airbyte.protocol.models.v0.DestinationSyncMode; +import java.io.IOException; +import java.sql.SQLException; +import java.util.List; +import java.util.Map; +import java.util.stream.Collectors; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * This implementation is similar to {@link StreamCopier}. It performs the following operations: + *
    + *
  • 1. Writes data stream into tmp Iceberg table in cloud storage.
  • + *
  • 2. Creates(or modifies the schema of) the destination Iceberg table in Galaxy Catalog based + * on the tmp Iceberg table schema.
  • + *
  • 4. Copies the tmp Iceberg table into the destination Iceberg table in Galaxy Catalog.
  • + *
  • 5. Deletes the tmp Iceberg table.
  • + *
+ */ +public abstract class StarburstGalaxyStreamCopier + implements StreamCopier { + + private static final Logger LOGGER = LoggerFactory.getLogger(StarburstGalaxyStreamCopier.class); + + private final String quotedDestTableName; + private final DestinationSyncMode destinationSyncMode; + private final boolean purgeStagingTable; + private final JdbcDatabase database; + private final StarburstGalaxySqlOperations sqlOperations; + + protected final String schemaName; + protected final String quotedSchemaName; + protected final String streamName; + protected final String tmpTableName; + protected final String stagingFolder; + protected final StarburstGalaxyDestinationConfig galaxyDestinationConfig; + protected TableSchema galaxySchema; + + public StarburstGalaxyStreamCopier(final String stagingFolder, + final String schemaName, + final ConfiguredAirbyteStream configuredStream, + final JdbcDatabase database, + final StarburstGalaxyDestinationConfig galaxyDestinationConfig, + final StandardNameTransformer nameTransformer, + final SqlOperations sqlOperations) { + this.schemaName = schemaName; + this.quotedSchemaName = "\"" + this.schemaName + "\""; // Wrap schema name with double quotes to support Galaxy reserved keywords + this.streamName = configuredStream.getStream().getName(); + this.destinationSyncMode = configuredStream.getDestinationSyncMode(); + this.purgeStagingTable = galaxyDestinationConfig.purgeStagingData(); + this.database = database; + this.sqlOperations = (StarburstGalaxySqlOperations) sqlOperations; + this.galaxyDestinationConfig = galaxyDestinationConfig; + this.tmpTableName = nameTransformer.getTmpTableName(streamName); + this.quotedDestTableName = "\"" + nameTransformer.getIdentifier(streamName) + "\""; // Wrap table name with double quotes to support Galaxy + // reserved + // keywords + this.stagingFolder = stagingFolder; + LOGGER.info("[Stream {}] Catalog schema: {}", streamName, this.schemaName); + } + + static TableSchema convertIcebergSchemaToGalaxySchema(org.apache.iceberg.Schema icebergSchema) { + TableSchema tableSchema = new TableSchema(); + icebergSchema.columns() + .forEach( + // Wrap column name in double quotes to support reserved keywords + column -> tableSchema + .addColumn(new ColumnMetadata("\"" + column.name() + "\"", toTrinoType(column.type(), TESTING_TYPE_MANAGER), column.fieldId()))); + return tableSchema; + } + + protected abstract String getTmpTableLocation(); + + @Override + public void createDestinationSchema() throws Exception { + LOGGER.info("[Stream {}] Create schema if it does not exist: {}", streamName, schemaName); + sqlOperations.createSchemaIfNotExists(database, quotedSchemaName); + } + + @Override + public void createTemporaryTable() throws Exception { + String registerTable = format(""" + CALL system.register_table(schema_name => '%s', table_name => '%s', + table_location => '%s', + metadata_file_name => '%s') + """, schemaName, tmpTableName, getTmpTableLocation(), getTmpTableMetadataFileName()); + LOGGER.info("[Stream {}] Register table: {}", streamName, registerTable); + database.execute(registerTable); + LOGGER.info("[Stream {}] Table {} is registered", streamName, tmpTableName); + } + + protected abstract String getTmpTableMetadataFileName() + throws IOException, InterruptedException; + + @Override + public void copyStagingFileToTemporaryTable() { + // The tmp table is created directly based on the staging file. So no separate copying step is + // needed. + } + + /** + * Adds newly created source columns to target + */ + private void promoteSourceSchemaChangesToDestination() + throws SQLException { + List describeTable = database.queryJsons(format("DESCRIBE %s.%s", quotedSchemaName, quotedDestTableName)); + LOGGER.info("[Stream {}] Existing table structure for {}.{} table is {}", streamName, schemaName, quotedDestTableName, describeTable); + Map existingColumns = describeTable.stream().collect( + // Column name is wrapped within double quotes as column name in Galaxy schema is wrapped within + // double quotes when the schema is created + Collectors.toMap(column -> "\"" + column.get("Column").asText().toLowerCase(ENGLISH) + "\"", + column -> column.get("Type").asText().toLowerCase(ENGLISH))); + galaxySchema.columns().forEach(columnMetadata -> { + String columnName = columnMetadata.name().toLowerCase(ENGLISH); + if (!existingColumns.containsKey(columnName)) { + try { + String alterTable = + format( + "ALTER TABLE %s.%s ADD COLUMN IF NOT EXISTS %s %s", + quotedSchemaName, + quotedDestTableName, + columnName, + columnMetadata.galaxyIcebergType().getDisplayName()); + LOGGER.info("[Stream {}] Add column {} : {}", streamName, columnName, alterTable); + database.execute(alterTable); + } catch (SQLException e) { + throw new RuntimeException(e); + } + } + }); + } + + @Override + public String createDestinationTable() throws Exception { + if (destinationSyncMode == OVERWRITE) { + // Drop existing table to propagate source schema changes + String dropTable = format("DROP TABLE IF EXISTS %s.%s", quotedSchemaName, quotedDestTableName); + LOGGER.info("[Stream {}] Dropping destination table: {}", streamName, dropTable); + database.execute(dropTable); + } + + String fields = galaxySchema.columns().stream() + .map(columnMetadata -> format("%s %s", + columnMetadata.name(), + columnMetadata.galaxyIcebergType().getDisplayName())) + .collect(Collectors.joining(", ")); + String createTable = + format("CREATE TABLE IF NOT EXISTS %s.%s (%s) WITH (format = 'PARQUET', type = 'ICEBERG')", quotedSchemaName, quotedDestTableName, fields); + LOGGER.info("[Stream {}] Create destination table if it does not exist: {}", streamName, createTable); + database.execute(createTable); + if (destinationSyncMode == APPEND) { + LOGGER.info("[Stream {}] Promote new columns from source to target", streamName); + promoteSourceSchemaChangesToDestination(); + } + + return quotedDestTableName; + } + + @Override + public void removeFileAndDropTmpTable() + throws SQLException { + if (purgeStagingTable) { + LOGGER.info("[Stream {}] Delete tmp table: {}", streamName, tmpTableName); + sqlOperations.dropTableIfExists(database, quotedSchemaName, tmpTableName); + } + } + +} diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyStreamCopierFactory.java b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyStreamCopierFactory.java new file mode 100644 index 0000000000000..fe8959486775c --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyStreamCopierFactory.java @@ -0,0 +1,12 @@ +/* + * Copyright (c) 2023 Airbyte, Inc., all rights reserved. + */ + +package io.airbyte.integrations.destination.starburst_galaxy; + +import io.airbyte.integrations.destination.jdbc.copy.StreamCopierFactory; + +public interface StarburstGalaxyStreamCopierFactory + extends StreamCopierFactory { + +} diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/TableSchema.java b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/TableSchema.java new file mode 100644 index 0000000000000..b2ce1261e90de --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/java/io/airbyte/integrations/destination/starburst_galaxy/TableSchema.java @@ -0,0 +1,29 @@ +/* + * Copyright (c) 2023 Airbyte, Inc., all rights reserved. + */ + +package io.airbyte.integrations.destination.starburst_galaxy; + +import com.google.common.collect.ImmutableSet; +import java.util.Comparator; +import java.util.Set; +import java.util.SortedSet; +import java.util.TreeSet; + +public class TableSchema { + + private final SortedSet columns; + + public TableSchema() { + columns = new TreeSet<>(Comparator.comparingInt(ColumnMetadata::position)); + } + + public void addColumn(ColumnMetadata columnMetadata) { + columns.add(columnMetadata); + } + + public Set columns() { + return ImmutableSet.copyOf(columns); + } + +} diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/resources/spec.json b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/resources/spec.json new file mode 100644 index 0000000000000..96e14b888badf --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/main/resources/spec.json @@ -0,0 +1,130 @@ +{ + "documentationUrl": "https://docs.airbyte.com/integrations/destinations/starburst-galaxy", + "connectionSpecification": { + "$schema": "http://json-schema.org/draft-07/schema#", + "title": "Starburst Galaxy Destination Spec", + "type": "object", + "required": [ "accept_terms", "server_hostname", "username", "password", "catalog", "staging_object_store" ], + "properties": { + "accept_terms": { + "title": "Agree to the Starburst Galaxy terms & conditions", + "type": "boolean", + "description": "You must agree to the Starburst Galaxy terms & conditions to use this connector.", + "default": false, + "order": 1 + }, + "server_hostname": { + "title": "Hostname", + "type": "string", + "description": "Starburst Galaxy cluster hostname.", + "examples": [ "abc-12345678-wxyz.trino.galaxy-demo.io" ], + "order": 2 + }, + "port": { + "title": "Port", + "type": "string", + "description": "Starburst Galaxy cluster port.", + "default": "443", + "examples": [ "443" ], + "order": 3 + }, + "username": { + "title": "User", + "type": "string", + "description": "Starburst Galaxy user.", + "examples": [ "user@example.com" ], + "order": 4 + }, + "password": { + "title": "Password", + "type": "string", + "description": "Starburst Galaxy password for the specified user.", + "examples": [ "password" ], + "airbyte_secret": true, + "order": 5 + }, + "catalog": { + "title": "Amazon S3 catalog", + "type": "string", + "description": "Name of the Starburst Galaxy Amazon S3 catalog.", + "examples": [ "sample_s3_catalog" ], + "order": 6 + }, + "catalog_schema": { + "title": "Amazon S3 catalog schema", + "type": "string", + "description": "The default Starburst Galaxy Amazon S3 catalog schema where tables are written to if the source does not specify a namespace. Defaults to \"public\".", + "default": "public", + "examples": [ "public" ], + "order": 7 + }, + "staging_object_store": { + "title": "Staging object store", + "type": "object", + "description": "Temporary storage on which temporary Iceberg table is created.", + "oneOf": [ { + "title": "Amazon S3", + "required": [ "object_store_type", "s3_bucket_name", "s3_bucket_path", "s3_bucket_region", "s3_access_key_id", "s3_secret_access_key" ], + "properties": { + "object_store_type": { + "type": "string", + "enum": [ "S3" ], + "default": "S3", + "order": 1 + }, + "s3_bucket_name": { + "title": "S3 bucket name", + "type": "string", + "description": "Name of the S3 bucket", + "examples": [ "airbyte_staging" ], + "order": 1 + }, + "s3_bucket_path": { + "title": "S3 bucket path", + "type": "string", + "description": "Directory in the S3 bucket where staging data is stored.", + "examples": [ "temp_airbyte__sync/test" ], + "order": 2 + }, + "s3_bucket_region": { + "title": "S3 bucket region", + "type": "string", + "default": "us-east-1", + "description": "The region of the S3 bucket.", + "enum": [ "ap-northeast-1", "ap-southeast-1", "ap-southeast-2", "ca-central-1", "eu-central-1", "eu-west-1", "eu-west-2", "eu-west-3", "us-east-1", "us-east-2", "us-west-1", "us-west-2" ], + "order": 3 + }, + "s3_access_key_id": { + "title": "Access key", + "type": "string", + "description": "Access key with access to the bucket. Airbyte requires read and write permissions to a given bucket.", + "examples": [ "A012345678910EXAMPLE" ], + "airbyte_secret": true, + "order": 4 + }, + "s3_secret_access_key": { + "title": "Secret key", + "type": "string", + "description": "Secret key used with the specified access key.", + "examples": [ "a012345678910ABCDEFGH/AbCdEfGhEXAMPLEKEY" ], + "airbyte_secret": true, + "order": 5 + } + } + } ], + "order": 8 + }, + "purge_staging_table": { + "title": "Purge staging Iceberg table", + "type": "boolean", + "description": "Defaults to 'true'. Switch to 'false' for debugging purposes.", + "default": true, + "order": 9 + } + } + }, + "supportsIncremental": true, + "supportsNormalization": false, + "supportsDBT": false, + "supported_destination_sync_modes": [ "overwrite", "append" ] +} \ No newline at end of file diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyDestinationAcceptanceTest.java b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyDestinationAcceptanceTest.java new file mode 100644 index 0000000000000..92a9bce20da69 --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyDestinationAcceptanceTest.java @@ -0,0 +1,247 @@ +/* + * Copyright (c) 2023 Airbyte, Inc., all rights reserved. + */ + +package io.airbyte.integrations.destination.starburst_galaxy; + +import static io.airbyte.commons.json.Jsons.deserialize; +import static io.airbyte.db.factory.DSLContextFactory.create; +import static io.airbyte.db.jdbc.JdbcUtils.getDefaultJSONFormat; +import static io.airbyte.integrations.base.JavaBaseConstants.COLUMN_NAME_EMITTED_AT; +import static io.airbyte.integrations.destination.s3.util.AvroRecordHelper.pruneAirbyteJson; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyBaseDestination.getGalaxyConnectionString; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyConstants.STARBURST_GALAXY_DRIVER_CLASS; +import static io.airbyte.protocol.models.v0.AirbyteMessage.Type.RECORD; +import static io.airbyte.protocol.models.v0.DestinationSyncMode.APPEND; +import static io.airbyte.protocol.models.v0.DestinationSyncMode.OVERWRITE; +import static io.airbyte.protocol.models.v0.SyncMode.FULL_REFRESH; +import static java.lang.String.format; +import static java.nio.file.Files.readString; +import static java.util.Locale.ENGLISH; +import static java.util.stream.Collectors.toList; +import static org.jooq.impl.DSL.asterisk; +import static org.jooq.impl.DSL.field; +import static org.junit.jupiter.api.Assertions.assertEquals; + +import com.fasterxml.jackson.databind.JsonNode; +import com.google.common.collect.Lists; +import io.airbyte.commons.json.Jsons; +import io.airbyte.db.ContextQueryFunction; +import io.airbyte.db.Database; +import io.airbyte.integrations.base.AirbyteMessageConsumer; +import io.airbyte.integrations.base.Destination; +import io.airbyte.integrations.destination.StandardNameTransformer; +import io.airbyte.integrations.destination.jdbc.copy.StreamCopierFactory; +import io.airbyte.integrations.destination.s3.avro.JsonFieldNameUpdater; +import io.airbyte.integrations.destination.s3.util.AvroRecordHelper; +import io.airbyte.integrations.standardtest.destination.DestinationAcceptanceTest; +import io.airbyte.protocol.models.v0.AirbyteMessage; +import io.airbyte.protocol.models.v0.AirbyteRecordMessage; +import io.airbyte.protocol.models.v0.AirbyteStream; +import io.airbyte.protocol.models.v0.ConfiguredAirbyteCatalog; +import io.airbyte.protocol.models.v0.ConfiguredAirbyteStream; +import io.airbyte.protocol.models.v0.DestinationSyncMode; +import java.io.IOException; +import java.nio.file.Path; +import java.sql.SQLException; +import java.time.Instant; +import java.util.List; +import java.util.Map; +import java.util.Objects; +import java.util.stream.Collectors; +import org.jooq.DSLContext; +import org.jooq.SQLDialect; +import org.junit.jupiter.api.Test; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +public abstract class StarburstGalaxyDestinationAcceptanceTest extends DestinationAcceptanceTest { + + private static final Logger LOGGER = LoggerFactory.getLogger(StarburstGalaxyDestinationAcceptanceTest.class); + private static final String INPUT_FILES_BASE_LOCATION = "testdata/"; + private static final StandardNameTransformer nameTransformer = new StarburstGalaxyNameTransformer(); + + protected JsonNode configJson; + protected StarburstGalaxyDestinationConfig galaxyDestinationConfig; + private DSLContext dslContext; + private Database database; + + @Override + protected void setup(TestDestinationEnv testEnv) { + dslContext = create(galaxyDestinationConfig.galaxyUsername(), galaxyDestinationConfig.galaxyPassword(), STARBURST_GALAXY_DRIVER_CLASS, + getGalaxyConnectionString(galaxyDestinationConfig), SQLDialect.DEFAULT); + database = new Database(dslContext); + } + + @Override + protected String getImageName() { + return "airbyte/destination-starburst-galaxy:dev"; + } + + @Override + protected JsonNode getConfig() { + return configJson; + } + + @Override + protected List retrieveRecords(final TestDestinationEnv testEnv, + final String streamName, + final String namespace, + final JsonNode streamSchema) + throws SQLException { + final String tableName = nameTransformer.getIdentifier(streamName); + final String schemaName = StreamCopierFactory.getSchema(namespace, galaxyDestinationConfig.galaxyCatalogSchema(), nameTransformer); + final JsonFieldNameUpdater nameUpdater = AvroRecordHelper.getFieldNameUpdater(streamName, namespace, streamSchema); + return executeQuery( + ctx -> ctx.select(asterisk()) + .from(format("%s.%s", schemaName, tableName)) + .orderBy(field(COLUMN_NAME_EMITTED_AT).asc()) + .fetch().stream() + .map(record -> { + final JsonNode json = deserialize(record.formatJSON(getDefaultJSONFormat())); + final JsonNode jsonWithOriginalFields = nameUpdater.getJsonWithOriginalFieldNames(json); + return pruneAirbyteJson(jsonWithOriginalFields); + }) + .collect(toList())); + } + + @Override + protected void tearDown(final TestDestinationEnv testEnv) throws SQLException { + // clean up database + List schemas = executeQuery(format("SHOW SCHEMAS LIKE '%s'", galaxyDestinationConfig.galaxyCatalogSchema().toLowerCase(ENGLISH))); + schemas.stream().map(node -> node.get("Schema").asText()) + .forEach(schema -> { + try { + List tables = executeQuery(format("SHOW TABLES FROM %s", galaxyDestinationConfig.galaxyCatalogSchema())); + tables.forEach(table -> { + try { + String tableName = table.get("Table").asText(); + LOGGER.info("Dropping table : {}.{}", schema, tableName); + executeQuery(format("DROP TABLE IF EXISTS %s.%s", schema, tableName)); + } catch (SQLException e) { + throw new RuntimeException(e); + } + }); + } catch (SQLException e) { + throw new RuntimeException(e); + } + }); + executeQuery(format("DROP SCHEMA IF EXISTS %s", galaxyDestinationConfig.galaxyCatalogSchema().toLowerCase(ENGLISH))); + + dslContext.close(); + } + + private List executeQuery(ContextQueryFunction> transform) + throws SQLException { + return database.query(transform); + } + + private List executeQuery(String query) + throws SQLException { + return executeQuery(ctx -> ctx.resultQuery(query) + .stream() + .map(record -> deserialize(record.formatJSON(getDefaultJSONFormat()))) + .collect(toList())); + } + + @Test + public void testPromoteSourceSchemaChanges() throws Exception { + String sampleStream = "sample_stream_1"; + testStreamSync(OVERWRITE, sampleStream, "schema-overwrite.json", "data-overwrite.json", "expected-schema-overwrite.json"); + testStreamSync(APPEND, sampleStream, "schema-append.json", "data-append.json", "expected-schema-append.json"); + assertEquals(2, + executeQuery(format("SELECT COUNT(*) FROM %s.%s", galaxyDestinationConfig.galaxyCatalogSchema(), sampleStream)).get(0).get("_col0").asInt()); + } + + private void testStreamSync(DestinationSyncMode syncMode, + String streamName, + String schemaFileName, + String dataFileName, + String expectedSchemaFileName) + throws Exception { + JsonNode overwriteSchema = getTestDataFromResourceJson(schemaFileName); + AirbyteMessage overwriteMessage = createRecordMessage(streamName, getTestDataFromResourceJson(dataFileName)); + runDestinationWrite(getCommonCatalog(streamName, overwriteSchema, syncMode), configJson, overwriteMessage); + validateTableSchema(streamName, expectedSchemaFileName); + } + + private void validateTableSchema(String streamName, String expectedSchemaFileName) + throws SQLException { + List describeRecords = executeQuery(format("DESCRIBE %s.%s", galaxyDestinationConfig.galaxyCatalogSchema(), streamName)); + Map actualDataTypes = + describeRecords.stream().collect(Collectors.toMap(column -> column.get("Column").asText(), column -> column.get("Type").asText())); + JsonNode expectedDataTypes = getTestDataFromResourceJson(expectedSchemaFileName); + assertEquals(expectedDataTypes.size(), actualDataTypes.size()); + expectedDataTypes.fields().forEachRemaining(field -> assertEquals(field.getValue().asText(), actualDataTypes.get(field.getKey()))); + } + + @Test + public void testJsonV0Types() throws Exception { + testDifferentTypes("sample_stream_2", "datatypeV0.json", "dataV0.json", "expected-datatypeV0.json", "expected-dataV0.json"); + } + + @Test + public void testJsonV1Types() throws Exception { + testDifferentTypes("sample_stream_3", "datatypeV1.json", "dataV1.json", "expected-datatypeV1.json", "expected-dataV1.json"); + } + + private void testDifferentTypes(String streamName, String dataTypeFile, String dataFile, String expectedDataTypeFile, String expectedDataFile) + throws Exception { + + JsonNode datatypeSchema = getTestDataFromResourceJson(dataTypeFile); + AirbyteMessage datatypeMessage = createRecordMessage(streamName, getTestDataFromResourceJson(dataFile)); + runDestinationWrite(getCommonCatalog(streamName, datatypeSchema, OVERWRITE), configJson, datatypeMessage); + final JsonFieldNameUpdater nameUpdater = + AvroRecordHelper.getFieldNameUpdater(streamName, galaxyDestinationConfig.galaxyCatalogSchema(), datatypeSchema); + validateTableSchema(streamName, expectedDataTypeFile); + + List records = executeQuery(ctx -> ctx.select(asterisk()) + .from(format("%s.%s", galaxyDestinationConfig.galaxyCatalogSchema(), streamName)) + .orderBy(field(COLUMN_NAME_EMITTED_AT).asc()) + .fetch().stream() + .map(record -> { + final JsonNode json = deserialize(record.formatJSON(getDefaultJSONFormat())); + final JsonNode jsonWithOriginalFields = nameUpdater.getJsonWithOriginalFieldNames(json); + return pruneAirbyteJson(jsonWithOriginalFields); + }) + .collect(toList())); + JsonNode actualData = records.get(0); + JsonNode expectedData = getTestDataFromResourceJson(expectedDataFile); + assertEquals(expectedData.size(), actualData.size()); + expectedData.fields().forEachRemaining(field -> assertEquals(field.getValue(), actualData.get(field.getKey()))); + } + + private static AirbyteMessage createRecordMessage(String streamName, final JsonNode data) { + return new AirbyteMessage() + .withType(RECORD) + .withRecord(new AirbyteRecordMessage().withStream(streamName).withData(data).withEmittedAt(Instant.now().toEpochMilli())); + } + + public static ConfiguredAirbyteCatalog getCommonCatalog(String stream, final JsonNode schema, DestinationSyncMode destinationSyncMode) { + return new ConfiguredAirbyteCatalog().withStreams(Lists.newArrayList(new ConfiguredAirbyteStream() + .withStream(new AirbyteStream().withName(stream).withJsonSchema(schema) + .withSupportedSyncModes(Lists.newArrayList(FULL_REFRESH))) + .withSyncMode(FULL_REFRESH).withDestinationSyncMode(destinationSyncMode))); + } + + private static void runDestinationWrite(ConfiguredAirbyteCatalog catalog, JsonNode config, AirbyteMessage... messages) throws Exception { + final StarburstGalaxyDestination destination = new StarburstGalaxyDestination(); + final AirbyteMessageConsumer consumer = destination.getConsumer(config, catalog, Destination::defaultOutputRecordCollector); + consumer.start(); + for (AirbyteMessage message : messages) { + consumer.accept(message); + } + consumer.close(); + } + + private static JsonNode getTestDataFromResourceJson(final String fileName) { + try { + String fileContent = readString(Path.of(Objects.requireNonNull(StarburstGalaxyDestinationAcceptanceTest.class.getClassLoader() + .getResource(INPUT_FILES_BASE_LOCATION + fileName)).getPath())); + return Jsons.deserialize(fileContent); + } catch (final IOException e) { + throw new RuntimeException(e); + } + } + +} diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyS3DestinationAcceptanceTest.java b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyS3DestinationAcceptanceTest.java new file mode 100644 index 0000000000000..460e564a04579 --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyS3DestinationAcceptanceTest.java @@ -0,0 +1,66 @@ +/* + * Copyright (c) 2023 Airbyte, Inc., all rights reserved. + */ + +package io.airbyte.integrations.destination.starburst_galaxy; + +import static io.airbyte.integrations.destination.s3.constant.S3Constants.S_3_ACCESS_KEY_ID; +import static io.airbyte.integrations.destination.s3.constant.S3Constants.S_3_BUCKET_PATH; +import static io.airbyte.integrations.destination.s3.constant.S3Constants.S_3_SECRET_ACCESS_KEY; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyConstants.CATALOG_SCHEMA; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyConstants.STAGING_OBJECT_STORE; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyDestinationConfig.get; +import static java.util.Locale.ENGLISH; +import static org.apache.commons.lang3.RandomStringUtils.randomAlphanumeric; +import static org.slf4j.LoggerFactory.getLogger; + +import com.fasterxml.jackson.databind.JsonNode; +import com.fasterxml.jackson.databind.node.ObjectNode; +import io.airbyte.commons.io.IOs; +import io.airbyte.commons.json.Jsons; +import io.airbyte.integrations.destination.s3.S3DestinationConfig; +import java.nio.file.Path; +import java.util.List; +import org.slf4j.Logger; + +public class StarburstGalaxyS3DestinationAcceptanceTest + extends StarburstGalaxyDestinationAcceptanceTest { + + private static final Logger LOGGER = getLogger(StarburstGalaxyS3DestinationAcceptanceTest.class); + private static final String SECRETS_CONFIG_JSON = "secrets/config.json"; + + @Override + protected JsonNode getFailCheckConfig() { + JsonNode failCheckJson = Jsons.clone(configJson); + // set invalid credential + ((ObjectNode) failCheckJson.get(STAGING_OBJECT_STORE)) + .put(S_3_ACCESS_KEY_ID, "fake-key") + .put(S_3_SECRET_ACCESS_KEY, "fake-secret"); + return failCheckJson; + } + + @Override + protected void setup(TestDestinationEnv testEnv) { + JsonNode baseConfigJson = Jsons.deserialize(IOs.readFile(Path.of(SECRETS_CONFIG_JSON))); + + // Set a random s3 bucket path and database schema for each integration test + String randomString = randomAlphanumeric(5); + JsonNode configJson = Jsons.clone(baseConfigJson); + ((ObjectNode) configJson).put(CATALOG_SCHEMA, configJson.get(CATALOG_SCHEMA).asText() + "_" + randomString); + JsonNode stagingStore = configJson.get(STAGING_OBJECT_STORE); + ((ObjectNode) stagingStore).put(S_3_BUCKET_PATH, "test_" + randomString); + + this.configJson = configJson; + this.galaxyDestinationConfig = get(configJson); + S3DestinationConfig s3Config = galaxyDestinationConfig.storageConfig().getS3DestinationConfigOrThrow(); + LOGGER.info("Test full path: s3://{}/{}", s3Config.getBucketName(), s3Config.getBucketPath()); + + super.setup(testEnv); // Create a database + } + + @Override + protected List resolveIdentifier(String identifier) { + return List.of(identifier.toLowerCase(ENGLISH)); + } + +} diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/data-append.json b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/data-append.json new file mode 100644 index 0000000000000..c29e7ddb1437c --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/data-append.json @@ -0,0 +1,7 @@ +{ + "c_array": ["151", "152", "153", "text", "true", "false"], + "c_row": { + "first_name": "charles" + }, + "c_new_column" : 345 +} diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/data-overwrite.json b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/data-overwrite.json new file mode 100644 index 0000000000000..47391459c04e9 --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/data-overwrite.json @@ -0,0 +1,6 @@ +{ + "c_array": ["151", "152", "153", "text", "true", "false"], + "c_row": { + "first_name": "charles" + } +} diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/dataV0.json b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/dataV0.json new file mode 100644 index 0000000000000..20b5a73a71249 --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/dataV0.json @@ -0,0 +1,18 @@ +{ + "string_type" : "sample_string", + "number_integer_type" : 948, + "string_big_integer_type" : "54667543256786653424", + "number_float_type" : 34.657322, + "number_type" : 78.32, + "integer_type" : 5389, + "boolean_type" : false, + "date_time_with_tz_type" : "2021-07-12T03:12:22+05:00", + "date_time_without_tz_type" : "2023-11-07T02:10:32", + "date_type" : "2021-01-01", + "time_type": "12:23:01.541", + "array_type": ["151", "152"], + "row_type": { + "first_name": "charles", + "last_name": "darwin" + } +} \ No newline at end of file diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/dataV1.json b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/dataV1.json new file mode 100644 index 0000000000000..f25dd7f7b706f --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/dataV1.json @@ -0,0 +1,12 @@ +{ + "string_type" : "sample_string", + "integer_type" : 5389, + "number_type" : 78.32, + "boolean_type" : false, + "binary_type" : "FDX", + "date_type" : "2021-01-01", + "timestamp_with_tz_type" : "2021-01-01T07:06:13+05:00", + "timestamp_without_tz_type" : "2023-03-01T04:05:01", + "time_with_tz_type" : "12:23:01.541+05:00", + "time_without_tz_type" : "12:16:04.541Z" +} \ No newline at end of file diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/datatypeV0.json b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/datatypeV0.json new file mode 100644 index 0000000000000..b6babed119026 --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/datatypeV0.json @@ -0,0 +1,62 @@ +{ + "type": "object", + "properties": { + "string_type": { + "type": "string" + }, + "number_integer_type": { + "type": "number", + "airbyte_type" : "integer" + }, + "string_big_integer_type": { + "type": "string", + "airbyte_type" : "big_integer" + }, + "number_float_type": { + "type": "number", + "airbyte_type" : "float" + }, + "number_type": { + "type": "number" + }, + "integer_type": { + "type": "integer" + }, + "boolean_type": { + "type": "boolean" + }, + "date_time_with_tz_type": { + "type": "string", + "format": "date-time" + }, + "date_time_without_tz_type": { + "type": "string", + "format": "date-time" + }, + "date_type": { + "type": "string", + "format": "date" + }, + "time_type": { + "type": "string", + "format": "time" + }, + "array_type": { + "type": "array", + "items": { + "type": "string" + } + }, + "row_type": { + "type": ["object"], + "properties": { + "first_name": { + "type": "string" + }, + "last_name": { + "type": "string" + } + } + } + } +} \ No newline at end of file diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/datatypeV1.json b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/datatypeV1.json new file mode 100644 index 0000000000000..dc1f5f144e458 --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/datatypeV1.json @@ -0,0 +1,35 @@ +{ + "type": "object", + "properties": { + "string_type": { + "$ref": "WellKnownTypes.json#/definitions/String" + }, + "integer_type": { + "$ref": "WellKnownTypes.json#/definitions/Integer" + }, + "number_type": { + "$ref": "WellKnownTypes.json#/definitions/Number" + }, + "boolean_type": { + "$ref": "WellKnownTypes.json#/definitions/Boolean" + }, + "binary_type": { + "$ref": "WellKnownTypes.json#/definitions/BinaryData" + }, + "date_type": { + "$ref": "WellKnownTypes.json#/definitions/Date" + }, + "timestamp_with_tz_type": { + "$ref": "WellKnownTypes.json#/definitions/TimestampWithTimezone" + }, + "timestamp_without_tz_type": { + "$ref": "WellKnownTypes.json#/definitions/TimestampWithoutTimezone" + }, + "time_with_tz_type": { + "$ref": "WellKnownTypes.json#/definitions/TimeWithTimezone" + }, + "time_without_tz_type": { + "$ref": "WellKnownTypes.json#/definitions/TimeWithoutTimezone" + } + } +} \ No newline at end of file diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/expected-dataV0.json b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/expected-dataV0.json new file mode 100644 index 0000000000000..31f5c47e0c5a5 --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/expected-dataV0.json @@ -0,0 +1,15 @@ +{ + "string_type" : "sample_string", + "number_integer_type" : 948, + "string_big_integer_type" : "54667543256786653424", + "number_float_type" : 34.657322, + "number_type" : 78.32, + "integer_type" : 5389, + "boolean_type" : false, + "date_time_with_tz_type" : "2021-07-11T22:12:22Z", + "date_time_without_tz_type" : "2023-11-07T02:10:32Z", + "date_type" : "2021-01-01", + "time_type" : "12:23:01", + "array_type": "[151, 152]", + "row_type": "{first_name=charles, last_name=darwin, _airbyte_additional_properties=null}" +} \ No newline at end of file diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/expected-dataV1.json b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/expected-dataV1.json new file mode 100644 index 0000000000000..c2034ac0c9652 --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/expected-dataV1.json @@ -0,0 +1,12 @@ +{ + "string_type" : "sample_string", + "integer_type" : 5389, + "number_type" : 78.32, + "boolean_type" : false, + "binary_type" : "RkRY", + "date_type" : "2021-01-01", + "timestamp_with_tz_type" : "2021-01-01T02:06:13Z", + "timestamp_without_tz_type" : "2023-03-01T04:05:01Z", + "time_with_tz_type" : "12:23:01.541+05:00", + "time_without_tz_type" : "12:16:04" +} \ No newline at end of file diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/expected-datatypeV0.json b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/expected-datatypeV0.json new file mode 100644 index 0000000000000..042a3bf74f1f5 --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/expected-datatypeV0.json @@ -0,0 +1,18 @@ +{ + "_airbyte_ab_id" : "varchar", + "_airbyte_emitted_at" : "timestamp(6) with time zone", + "string_type" : "varchar", + "number_integer_type" : "integer", + "string_big_integer_type" : "varchar", + "number_float_type" : "real", + "number_type" : "double", + "integer_type" : "bigint", + "boolean_type" : "boolean", + "date_time_without_tz_type" : "timestamp(6) with time zone", + "date_time_with_tz_type" : "timestamp(6) with time zone", + "date_type" : "date", + "time_type" : "time(6)", + "array_type" : "array(varchar)", + "row_type" : "row(first_name varchar, last_name varchar, _airbyte_additional_properties map(varchar, varchar))", + "_airbyte_additional_properties" : "map(varchar, varchar)" +} \ No newline at end of file diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/expected-datatypeV1.json b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/expected-datatypeV1.json new file mode 100644 index 0000000000000..ebac7cbcf9dfa --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/expected-datatypeV1.json @@ -0,0 +1,15 @@ +{ + "_airbyte_ab_id" : "varchar", + "_airbyte_emitted_at" : "timestamp(6) with time zone", + "string_type" : "varchar", + "integer_type" : "bigint", + "number_type" : "double", + "boolean_type" : "boolean", + "binary_type" : "varbinary", + "date_type" : "date", + "timestamp_with_tz_type" : "timestamp(6) with time zone", + "timestamp_without_tz_type" : "timestamp(6) with time zone", + "time_with_tz_type" : "varchar", + "time_without_tz_type" : "time(6)", + "_airbyte_additional_properties" : "map(varchar, varchar)" +} \ No newline at end of file diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/expected-schema-append.json b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/expected-schema-append.json new file mode 100644 index 0000000000000..7f09a6b6b363b --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/expected-schema-append.json @@ -0,0 +1,8 @@ +{ + "_airbyte_ab_id" : "varchar", + "c_row" : "row(first_name varchar, _airbyte_additional_properties map(varchar, varchar))", + "_airbyte_additional_properties" : "map(varchar, varchar)", + "c_array" : "array(varchar)", + "_airbyte_emitted_at" : "timestamp(6) with time zone", + "c_new_column" : "bigint" +} \ No newline at end of file diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/expected-schema-overwrite.json b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/expected-schema-overwrite.json new file mode 100644 index 0000000000000..2a31cd4ec5cc4 --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/expected-schema-overwrite.json @@ -0,0 +1,7 @@ +{ + "_airbyte_ab_id" : "varchar", + "c_row" : "row(first_name varchar, _airbyte_additional_properties map(varchar, varchar))", + "_airbyte_additional_properties" : "map(varchar, varchar)", + "c_array" : "array(varchar)", + "_airbyte_emitted_at" : "timestamp(6) with time zone" +} \ No newline at end of file diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/schema-append.json b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/schema-append.json new file mode 100644 index 0000000000000..5da1007582f8e --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/schema-append.json @@ -0,0 +1,23 @@ +{ + "type": ["object"], + "properties": { + "c_row": { + "type": ["null", "object"], + "properties": { + "first_name": { + "type": "string" + } + } + }, + "c_array": { + "type": "array", + "items": + { + "type": "string" + } + }, + "c_new_column": { + "type": ["null", "integer"] + } + } +} diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/schema-overwrite.json b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/schema-overwrite.json new file mode 100644 index 0000000000000..ab7cf67917f60 --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test-integration/resources/testdata/schema-overwrite.json @@ -0,0 +1,20 @@ +{ + "type": ["object"], + "properties": { + "c_row": { + "type": ["null", "object"], + "properties": { + "first_name": { + "type": "string" + } + } + }, + "c_array": { + "type": "array", + "items": + { + "type": "string" + } + } + } +} diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/test/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyDestinationConfigTest.java b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyDestinationConfigTest.java new file mode 100644 index 0000000000000..97d9dbfeb0275 --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyDestinationConfigTest.java @@ -0,0 +1,71 @@ +/* + * Copyright (c) 2023 Airbyte, Inc., all rights reserved. + */ + +package io.airbyte.integrations.destination.starburst_galaxy; + +import static io.airbyte.commons.jackson.MoreMappers.initMapper; +import static io.airbyte.integrations.destination.s3.constant.S3Constants.S_3_ACCESS_KEY_ID; +import static io.airbyte.integrations.destination.s3.constant.S3Constants.S_3_BUCKET_NAME; +import static io.airbyte.integrations.destination.s3.constant.S3Constants.S_3_BUCKET_PATH; +import static io.airbyte.integrations.destination.s3.constant.S3Constants.S_3_BUCKET_REGION; +import static io.airbyte.integrations.destination.s3.constant.S3Constants.S_3_SECRET_ACCESS_KEY; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyConstants.ACCEPT_TERMS; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyConstants.CATALOG; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyConstants.CATALOG_SCHEMA; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyConstants.OBJECT_STORE_TYPE; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyConstants.PASSWORD; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyConstants.PORT; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyConstants.SERVER_HOSTNAME; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyConstants.STAGING_OBJECT_STORE; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyConstants.USERNAME; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyDestinationConfig.DEFAULT_STARBURST_GALAXY_PORT; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyDestinationConfig.get; +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertThrows; + +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.databind.node.ObjectNode; +import org.junit.jupiter.api.Test; + +class StarburstGalaxyDestinationConfigTest { + + private static final ObjectMapper OBJECT_MAPPER = initMapper(); + + @Test + public void testConfigCreationFromJsonS3() { + final ObjectNode dataSourceConfig = OBJECT_MAPPER.createObjectNode() + .put(OBJECT_STORE_TYPE, "S3") + .put(S_3_BUCKET_NAME, "bucket_name") + .put(S_3_BUCKET_PATH, "bucket_path") + .put(S_3_BUCKET_REGION, "bucket_region") + .put(S_3_ACCESS_KEY_ID, "access_key_id") + .put(S_3_SECRET_ACCESS_KEY, "secret_access_key"); + + final ObjectNode starburstGalaxyConfig = OBJECT_MAPPER.createObjectNode() + .put(SERVER_HOSTNAME, "server_hostname") + .put(USERNAME, "username") + .put(PASSWORD, "password") + .put(CATALOG, "catalog") + .put(CATALOG_SCHEMA, "catalog_schema") + .set(STAGING_OBJECT_STORE, dataSourceConfig); + + assertThrows(IllegalArgumentException.class, () -> get(starburstGalaxyConfig)); + + starburstGalaxyConfig.put(ACCEPT_TERMS, false); + assertThrows(IllegalArgumentException.class, () -> get(starburstGalaxyConfig)); + + starburstGalaxyConfig.put(ACCEPT_TERMS, true); + final StarburstGalaxyDestinationConfig config1 = get(starburstGalaxyConfig); + assertEquals(DEFAULT_STARBURST_GALAXY_PORT, config1.galaxyPort()); + assertEquals(CATALOG_SCHEMA, config1.galaxyCatalogSchema()); + + starburstGalaxyConfig.put(PORT, DEFAULT_STARBURST_GALAXY_PORT); + final StarburstGalaxyDestinationConfig config2 = get(starburstGalaxyConfig); + assertEquals(DEFAULT_STARBURST_GALAXY_PORT, config2.galaxyPort()); + assertEquals(CATALOG_SCHEMA, config2.galaxyCatalogSchema()); + + assertEquals(StarburstGalaxyS3StagingStorageConfig.class, config2.storageConfig().getClass()); + } + +} diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/test/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyDestinationResolverTest.java b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyDestinationResolverTest.java new file mode 100644 index 0000000000000..024c10fbdd520 --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyDestinationResolverTest.java @@ -0,0 +1,56 @@ +/* + * Copyright (c) 2023 Airbyte, Inc., all rights reserved. + */ + +package io.airbyte.integrations.destination.starburst_galaxy; + +import static io.airbyte.commons.jackson.MoreMappers.initMapper; +import static io.airbyte.commons.resources.MoreResources.readResource; +import static io.airbyte.integrations.destination.s3.constant.S3Constants.S_3_BUCKET_NAME; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyConstants.STAGING_OBJECT_STORE; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyDestinationResolver.getStagingStorageType; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyDestinationResolver.isS3StagingStore; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyStagingStorageType.S3; +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.junit.jupiter.api.Assertions.assertThrows; +import static org.junit.jupiter.api.Assertions.assertTrue; + +import com.fasterxml.jackson.databind.JsonNode; +import com.fasterxml.jackson.databind.ObjectMapper; +import com.fasterxml.jackson.databind.node.ObjectNode; +import io.airbyte.commons.json.Jsons; +import org.junit.jupiter.api.DisplayName; +import org.junit.jupiter.api.Test; + +public class StarburstGalaxyDestinationResolverTest { + + private static final ObjectMapper OBJECT_MAPPER = initMapper(); + + @Test + @DisplayName("When given staging credentials should use S3") + public void useS3Test() { + final ObjectNode stubLoadingMethod = OBJECT_MAPPER.createObjectNode(); + stubLoadingMethod.put(S_3_BUCKET_NAME, "fake-bucket"); + final ObjectNode stubConfig = OBJECT_MAPPER.createObjectNode(); + stubConfig.set(STAGING_OBJECT_STORE, stubLoadingMethod); + assertTrue(isS3StagingStore(stubConfig)); + } + + @Test + @DisplayName("Staging staging storage credentials required") + public void stagingStorageCredentialsRequiredTest() { + final ObjectNode stubLoadingMethod = OBJECT_MAPPER.createObjectNode(); + final ObjectNode stubConfig = OBJECT_MAPPER.createObjectNode(); + stubConfig.set(STAGING_OBJECT_STORE, stubLoadingMethod); + assertThrows(IllegalArgumentException.class, () -> getStagingStorageType(stubConfig)); + } + + @Test + public void testS3ConfigType() throws Exception { + final String configFileName = "config.json"; + final JsonNode config = Jsons.deserialize(readResource(configFileName), JsonNode.class); + final StarburstGalaxyStagingStorageType stagingStorageType = getStagingStorageType(config); + assertEquals(S3, stagingStorageType); + } + +} diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/test/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyStagingStorageConfigTest.java b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyStagingStorageConfigTest.java new file mode 100644 index 0000000000000..298ed46b305b4 --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test/java/io/airbyte/integrations/destination/starburst_galaxy/StarburstGalaxyStagingStorageConfigTest.java @@ -0,0 +1,36 @@ +/* + * Copyright (c) 2023 Airbyte, Inc., all rights reserved. + */ + +package io.airbyte.integrations.destination.starburst_galaxy; + +import static io.airbyte.commons.jackson.MoreMappers.initMapper; +import static io.airbyte.integrations.destination.s3.constant.S3Constants.S_3_ACCESS_KEY_ID; +import static io.airbyte.integrations.destination.s3.constant.S3Constants.S_3_BUCKET_NAME; +import static io.airbyte.integrations.destination.s3.constant.S3Constants.S_3_BUCKET_PATH; +import static io.airbyte.integrations.destination.s3.constant.S3Constants.S_3_BUCKET_REGION; +import static io.airbyte.integrations.destination.s3.constant.S3Constants.S_3_SECRET_ACCESS_KEY; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyConstants.OBJECT_STORE_TYPE; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyStagingStorageConfig.getStarburstGalaxyStagingStorageConfig; +import static org.junit.jupiter.api.Assertions.assertNotNull; + +import com.fasterxml.jackson.databind.node.ObjectNode; +import org.junit.jupiter.api.Test; + +public class StarburstGalaxyStagingStorageConfigTest { + + @Test + public void testRetrieveS3Config() { + final ObjectNode dataSourceConfig = initMapper().createObjectNode() + .put(OBJECT_STORE_TYPE, "S3") + .put(S_3_BUCKET_NAME, "bucket_name") + .put(S_3_BUCKET_PATH, "bucket_path") + .put(S_3_BUCKET_REGION, "bucket_region") + .put(S_3_ACCESS_KEY_ID, "access_key_id") + .put(S_3_SECRET_ACCESS_KEY, "secret_access_key"); + + StarburstGalaxyStagingStorageConfig storageConfig = getStarburstGalaxyStagingStorageConfig(dataSourceConfig); + assertNotNull(storageConfig.getS3DestinationConfigOrThrow()); + } + +} diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/test/java/io/airbyte/integrations/destination/starburst_galaxy/TypeConversionTest.java b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test/java/io/airbyte/integrations/destination/starburst_galaxy/TypeConversionTest.java new file mode 100644 index 0000000000000..d2fc07b9ae936 --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test/java/io/airbyte/integrations/destination/starburst_galaxy/TypeConversionTest.java @@ -0,0 +1,68 @@ +/* + * Copyright (c) 2023 Airbyte, Inc., all rights reserved. + */ + +package io.airbyte.integrations.destination.starburst_galaxy; + +import static io.airbyte.commons.json.Jsons.deserialize; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyS3StreamCopier.convertIcebergSchemaToGalaxySchema; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyS3StreamCopier.getAvroSchema; +import static io.airbyte.integrations.destination.starburst_galaxy.StarburstGalaxyS3StreamCopier.getIcebergSchema; +import static java.util.Objects.requireNonNull; +import static org.junit.jupiter.api.Assertions.assertEquals; +import static org.slf4j.LoggerFactory.getLogger; + +import com.fasterxml.jackson.databind.JsonNode; +import java.io.IOException; +import java.nio.file.Files; +import java.nio.file.Path; +import org.apache.iceberg.Schema; +import org.junit.jupiter.api.Test; +import org.slf4j.Logger; + +public class TypeConversionTest { + + private static final Logger LOGGER = getLogger(TypeConversionTest.class); + private static final String INPUT_FILES_BASE_LOCATION = "schemas/"; + + @Test + public void testJsonV0SchemaTypesConversion() { + runTests(getTestDataFromResourceJson("type_conversion_test_cases_v0.json")); + } + + @Test + public void testJsonV1SchemaTypesConversion() { + runTests(getTestDataFromResourceJson("type_conversion_test_cases_v1.json")); + } + + private void runTests(JsonNode testCases) { + for (JsonNode testCase : testCases) { + String schemaName = testCase.get("schemaName").asText(); + JsonNode galaxyIcebergSchema = testCase.get("galaxyIcebergSchema"); + JsonNode jsonSchema = testCase.get("jsonSchema"); + LOGGER.info("Executing {} test", schemaName); + compareSchemas(jsonSchema, galaxyIcebergSchema); + } + } + + private static JsonNode getTestDataFromResourceJson(String fileName) { + try { + String fileContent = Files.readString(Path.of(requireNonNull(TypeConversionTest.class.getClassLoader() + .getResource(INPUT_FILES_BASE_LOCATION + fileName)).getPath())); + return deserialize(fileContent); + } catch (final IOException e) { + throw new RuntimeException(e); + } + } + + private void compareSchemas(JsonNode jsonSchema, JsonNode expectedIcebergGalaxySchema) { + Schema icebergSchema = getIcebergSchema(getAvroSchema("stream", "namespace", jsonSchema)); + TableSchema galaxySchema = convertIcebergSchemaToGalaxySchema(icebergSchema); + assertEquals(galaxySchema.columns().size(), expectedIcebergGalaxySchema.size()); + for (ColumnMetadata columnMetadata : galaxySchema.columns()) { + JsonNode expectedIcebergType = expectedIcebergGalaxySchema.get(columnMetadata.name()); + assertEquals(expectedIcebergType.textValue(), columnMetadata.galaxyIcebergType().getDisplayName()); + } + } + +} diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/test/resources/config.json b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test/resources/config.json new file mode 100644 index 0000000000000..23ddd917f6dcb --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test/resources/config.json @@ -0,0 +1,18 @@ +{ + "accept_terms": true, + "server_hostname": "abc-12345678-wxyz.galaxy.starburst.io", + "port": "443", + "username": "user@example.com", + "password": "password", + "staging_object_store": { + "object_store_type": "S3", + "s3_bucket_name": "required", + "s3_bucket_path": "required", + "s3_bucket_region": "required", + "s3_access_key_id": "required", + "s3_secret_access_key": "required" + }, + "purge_staging_data": true, + "catalog": "s3_catalog", + "catalog_schema": "public" +} diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/test/resources/schemas/type_conversion_test_cases_v0.json b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test/resources/schemas/type_conversion_test_cases_v0.json new file mode 100644 index 0000000000000..628d4718a7e02 --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test/resources/schemas/type_conversion_test_cases_v0.json @@ -0,0 +1,629 @@ +[ + { + "schemaName": "simple_schema", + "jsonSchema": { + "type": "object", + "properties": { + "node_id": { + "type": ["null", "string"] + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"node_id\"" : "varchar", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "nested_record", + "jsonSchema": { + "type": "object", + "properties": { + "node_id": { + "type": ["null", "string"] + }, + "user": { + "type": ["null", "object"], + "properties": { + "first_name": { + "type": "string" + }, + "last_name": { + "type": "string" + } + } + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"node_id\"" : "varchar", + "\"user\"" : "row(first_name varchar, last_name varchar, _airbyte_additional_properties map(varchar, varchar))", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "record_with_union_type", + "jsonSchema": { + "type": "object", + "properties": { + "identifier": { + "type": ["null", "number", "string"] + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"identifier\"" : "row(member0 double, member1 varchar)", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "array_with_same_type", + "jsonSchema": { + "type": "object", + "properties": { + "identifier": { + "type": "array", + "items": { + "type": "string" + } + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"identifier\"" : "array(varchar)", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "array_with_union_type", + "jsonSchema": { + "type": "object", + "properties": { + "identifiers": { + "type": "array", + "items": [ + { + "type": "string" + }, + { + "type": "integer" + }, + { + "type": "string" + }, + { + "type": "boolean" + } + ] + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"identifiers\"" : "array(row(member0 varchar, member1 bigint, member2 boolean))", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "field_with_combined_restriction", + "jsonSchema": { + "properties": { + "created_at": { + "anyOf": [ + { + "type": "string", + "format": "date-time" + }, + { + "type": ["null", "string"] + }, + { + "type": "integer" + } + ] + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"created_at\"" : "row(member0 varchar, member1 bigint)", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "record_with_combined_restriction_field", + "jsonSchema": { + "properties": { + "user": { + "type": "object", + "properties": { + "created_at": { + "anyOf": [ + { + "type": "string", + "format": "date-time" + }, + { + "type": ["null", "string"] + }, + { + "type": "integer" + } + ] + } + } + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"user\"" : "row(created_at row(member0 varchar, member1 bigint), _airbyte_additional_properties map(varchar, varchar))", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "array_with_combined_restriction_field", + "jsonSchema": { + "properties": { + "identifiers": { + "type": "array", + "items": [ + { + "oneOf": [ + { + "type": "integer" + }, + { + "type": "string" + } + ] + }, + { + "type": "boolean" + } + ] + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"identifiers\"" : "array(row(member0 bigint, member1 varchar, member2 boolean))", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "record_with_airbyte_additional_properties", + "jsonSchema": { + "type": "object", + "properties": { + "node_id": { + "type": ["null", "string"] + }, + "_airbyte_additional_properties": { + "type": "object" + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"node_id\"" : "varchar", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "record_with_ab_additional_properties", + "jsonSchema": { + "type": "object", + "properties": { + "node_id": { + "type": ["null", "string"] + }, + "_ab_additional_properties": { + "type": "object" + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"node_id\"" : "varchar", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "record_without_properties", + "jsonSchema": { + "type": "object" + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "schema_with_same_object_name", + "jsonSchema": { + "type": "object", + "properties": { + "author": { + "type": "object", + "properties": { + "id": { + "type": ["null", "integer"] + } + } + }, + "commit": { + "type": ["null", "object"], + "properties": { + "message": { + "type": ["null", "string"] + }, + "author": { + "type": ["null", "object"], + "properties": { + "name": { + "type": ["null", "string"] + }, + "pr": { + "type": ["null", "object"], + "properties": { + "id": { + "type": ["null", "string"] + }, + "message": { + "type": ["null", "string"] + } + } + } + } + } + } + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"author\"" : "row(id bigint, _airbyte_additional_properties map(varchar, varchar))", + "\"commit\"" : "row(message varchar, author row(name varchar, pr row(id varchar, message varchar, _airbyte_additional_properties map(varchar, varchar)), _airbyte_additional_properties map(varchar, varchar)), _airbyte_additional_properties map(varchar, varchar))", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "array_without_items_in_schema", + "jsonSchema": { + "type": "object", + "properties": { + "identifier": { + "type": "array" + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"identifier\"" : "array(varchar)", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "array_with_same_object_name", + "jsonSchema": { + "properties": { + "parent_object": { + "type": "object", + "properties": { + "object_array": { + "type": "array", + "items": [ + { "type": "integer" }, + { "type": "boolean" }, + { + "type": "object", + "properties": { + "id": { + "type": "object", + "properties": { + "id_part_1": { + "type": "integer" + }, + "id_part_2": { + "type": "string" + } + } + } + } + }, + { + "type": "object", + "properties": { + "id": { + "type": "object", + "properties": { + "id_part_1": { + "type": "string" + }, + "id_part_2": { + "type": "integer" + } + } + }, + ":message": { + "type": "string" + } + } + } + ] + } + } + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"parent_object\"" : "row(object_array array(row(member0 bigint, member1 boolean, member2 row(id row(id_part_1 row(member0 bigint, member1 varchar), id_part_2 row(member0 varchar, member1 bigint), _airbyte_additional_properties map(varchar, varchar)), _message varchar, _airbyte_additional_properties map(varchar, varchar)))), _airbyte_additional_properties map(varchar, varchar))", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "object_inside_array_inside_array", + "jsonSchema": { + "type": "object", + "properties": { + "filters": { + "type": ["null", "array"], + "items": { + "type": ["null", "array"], + "items": { + "type": ["null", "object"], + "properties": { + "filterFamily": { + "type": ["null", "string"] + } + } + } + } + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"filters\"" : "array(array(row(filterFamily varchar, _airbyte_additional_properties map(varchar, varchar))))", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "array_field_with_empty_items", + "jsonSchema": { + "type": "object", + "properties": { + "array_field": { + "type": "array", + "items": {} + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"array_field\"" : "array(varchar)", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "nullable_value", + "jsonSchema": { + "type": "object", + "properties": { + "node_id": { + "type": ["null", "number"] + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"node_id\"" : "double", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "any_of_with_same_name", + "jsonSchema": { + "type": ["null", "object"], + "properties": { + "same_record_name_field": { + "type": ["null", "object"], + "properties": { + "sub_field_1": { + "type": ["null", "string"] + } + } + }, + "any_of_field": { + "anyOf": [ + { + "type": ["null", "object"], + "properties": { + "same_record_name_field": { + "type": ["null", "object"], + "properties": { + "sub_field_2": { + "type": ["null", "string"] + } + } + } + } + } + ] + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"same_record_name_field\"" : "row(sub_field_1 varchar, _airbyte_additional_properties map(varchar, varchar))", + "\"any_of_field\"" : "row(same_record_name_field row(sub_field_2 varchar, _airbyte_additional_properties map(varchar, varchar)), _airbyte_additional_properties map(varchar, varchar))", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "all_of_with_same_name", + "jsonSchema": { + "type": ["null", "object"], + "properties": { + "same_record_name_field": { + "type": ["null", "object"], + "properties": { + "sub_field_1": { + "type": ["null", "string"] + } + } + }, + "all_of_field": { + "allOf": [ + { + "type": ["null", "object"], + "properties": { + "same_record_name_field": { + "type": ["null", "object"], + "properties": { + "sub_field_2": { + "type": ["null", "string"] + } + } + } + } + } + ] + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"same_record_name_field\"" : "row(sub_field_1 varchar, _airbyte_additional_properties map(varchar, varchar))", + "\"all_of_field\"" : "row(same_record_name_field row(sub_field_2 varchar, _airbyte_additional_properties map(varchar, varchar)), _airbyte_additional_properties map(varchar, varchar))", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "one_of_with_same_name", + "jsonSchema": { + "type": ["null", "object"], + "properties": { + "same_record_name_field": { + "type": ["null", "object"], + "properties": { + "sub_field_1": { + "type": ["null", "string"] + } + } + }, + "any_of_field": { + "anyOf": [ + { + "type": ["null", "object"], + "properties": { + "same_record_name_field": { + "type": ["null", "object"], + "properties": { + "sub_field_2": { + "type": ["null", "string"] + } + } + } + } + } + ] + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"same_record_name_field\"" : "row(sub_field_1 varchar, _airbyte_additional_properties map(varchar, varchar))", + "\"any_of_field\"" : "row(same_record_name_field row(sub_field_2 varchar, _airbyte_additional_properties map(varchar, varchar)), _airbyte_additional_properties map(varchar, varchar))", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "other_types", + "jsonSchema": { + "type": "object", + "properties": { + "null_type" : { + "type": "null" + }, + "string_type": { + "type": "string" + }, + "number_integer_type": { + "type": "number", + "airbyte_type" : "integer" + }, + "string_big_integer_type": { + "type": "string", + "airbyte_type" : "big_integer" + }, + "number_float_type": { + "type": "number", + "airbyte_type" : "float" + }, + "number_type": { + "type": "number" + }, + "integer_type": { + "type": "integer" + }, + "boolean_type": { + "type": "boolean" + }, + "date_time_type": { + "type": "string", + "format": "date-time" + }, + "date_type": { + "type": "string", + "format": "date" + }, + "time_type": { + "type": "string", + "format": "time" + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"string_type\"" : "varchar", + "\"number_integer_type\"" : "integer", + "\"string_big_integer_type\"" : "varchar", + "\"number_float_type\"" : "real", + "\"number_type\"" : "double", + "\"integer_type\"" : "bigint", + "\"boolean_type\"" : "boolean", + "\"date_time_type\"" : "timestamp(6) with time zone", + "\"date_type\"" : "date", + "\"time_type\"" : "time(6)", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + } +] diff --git a/airbyte-integrations/connectors/destination-starburst-galaxy/src/test/resources/schemas/type_conversion_test_cases_v1.json b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test/resources/schemas/type_conversion_test_cases_v1.json new file mode 100644 index 0000000000000..98c8eede9b337 --- /dev/null +++ b/airbyte-integrations/connectors/destination-starburst-galaxy/src/test/resources/schemas/type_conversion_test_cases_v1.json @@ -0,0 +1,613 @@ +[ + { + "schemaName": "simple_schema", + "jsonSchema": { + "type": "object", + "properties": { + "node_id": { + "$ref": "WellKnownTypes.json#/definitions/String" + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"node_id\"" : "varchar", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "nested_record", + "jsonSchema": { + "type": "object", + "properties": { + "node_id": { + "$ref": "WellKnownTypes.json#/definitions/String" + }, + "user": { + "type": "object", + "properties": { + "first_name": { + "$ref": "WellKnownTypes.json#/definitions/String" + }, + "last_name": { + "$ref": "WellKnownTypes.json#/definitions/String" + } + } + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"node_id\"" : "varchar", + "\"user\"" : "row(first_name varchar, last_name varchar, _airbyte_additional_properties map(varchar, varchar))", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "record_with_union_type", + "jsonSchema": { + "type": "object", + "properties": { + "identifier": { + "oneOf": [ + { "$ref": "WellKnownTypes.json#/definitions/Number" }, + { "$ref": "WellKnownTypes.json#/definitions/String" } + ] + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"identifier\"" : "row(member0 double, member1 varchar)", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "array_with_same_type", + "jsonSchema": { + "type": "object", + "properties": { + "identifier": { + "type": "array", + "items": { + "$ref": "WellKnownTypes.json#/definitions/String" + } + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"identifier\"" : "array(varchar)", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "array_with_union_type", + "jsonSchema": { + "type": "object", + "properties": { + "identifiers": { + "type": "array", + "items": [ + { "$ref": "WellKnownTypes.json#/definitions/String" }, + { "$ref": "WellKnownTypes.json#/definitions/Integer" }, + { "$ref": "WellKnownTypes.json#/definitions/String" }, + { "$ref": "WellKnownTypes.json#/definitions/Boolean" } + ] + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"identifiers\"" : "array(row(member0 varchar, member1 bigint, member2 boolean))", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "field_with_combined_restriction", + "jsonSchema": { + "properties": { + "created_at": { + "anyOf": [ + { + "$ref": "WellKnownTypes.json#/definitions/TimestampWithTimezone" + }, + { + "$ref": "WellKnownTypes.json#/definitions/String" + }, + { + "$ref": "WellKnownTypes.json#/definitions/Integer" + } + ] + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"created_at\"" : "row(member0 varchar, member1 bigint)", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "record_with_combined_restriction_field", + "jsonSchema": { + "properties": { + "user": { + "type": "object", + "properties": { + "created_at": { + "anyOf": [ + { + "$ref": "WellKnownTypes.json#/definitions/TimestampWithTimezone" + }, + { + "$ref": "WellKnownTypes.json#/definitions/String" + }, + { + "$ref": "WellKnownTypes.json#/definitions/Integer" + } + ] + } + } + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"user\"" : "row(created_at row(member0 varchar, member1 bigint), _airbyte_additional_properties map(varchar, varchar))", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "array_with_combined_restriction_field", + "jsonSchema": { + "properties": { + "identifiers": { + "type": "array", + "items": [ + { + "oneOf": [ + { + "$ref": "WellKnownTypes.json#/definitions/Integer" + }, + { + "$ref": "WellKnownTypes.json#/definitions/String" + } + ] + }, + { + "$ref": "WellKnownTypes.json#/definitions/Boolean" + } + ] + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"identifiers\"" : "array(row(member0 bigint, member1 varchar, member2 boolean))", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "record_with_airbyte_additional_properties", + "jsonSchema": { + "type": "object", + "properties": { + "node_id": { + "$ref": "WellKnownTypes.json#/definitions/String" + }, + "_airbyte_additional_properties": { + "type": "object" + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"node_id\"" : "varchar", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "record_with_ab_additional_properties", + "jsonSchema": { + "type": "object", + "properties": { + "node_id": { + "$ref": "WellKnownTypes.json#/definitions/String" + }, + "_ab_additional_properties": { + "type": "object" + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"node_id\"" : "varchar", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "record_without_properties", + "jsonSchema": { + "type": "object" + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "schema_with_same_object_name", + "jsonSchema": { + "type": "object", + "properties": { + "author": { + "type": "object", + "properties": { + "id": { + "$ref": "WellKnownTypes.json#/definitions/Integer" + } + } + }, + "commit": { + "type": "object", + "properties": { + "message": { + "$ref": "WellKnownTypes.json#/definitions/String" + }, + "author": { + "type": "object", + "properties": { + "name": { + "$ref": "WellKnownTypes.json#/definitions/String" + }, + "pr": { + "type": "object", + "properties": { + "id": { + "$ref": "WellKnownTypes.json#/definitions/String" + }, + "message": { + "$ref": "WellKnownTypes.json#/definitions/String" + } + } + } + } + } + } + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"author\"" : "row(id bigint, _airbyte_additional_properties map(varchar, varchar))", + "\"commit\"" : "row(message varchar, author row(name varchar, pr row(id varchar, message varchar, _airbyte_additional_properties map(varchar, varchar)), _airbyte_additional_properties map(varchar, varchar)), _airbyte_additional_properties map(varchar, varchar))", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "array_without_items_in_schema", + "jsonSchema": { + "type": "object", + "properties": { + "identifier": { + "type": "array" + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"identifier\"" : "array(varchar)", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "array_with_same_object_name", + "jsonSchema": { + "properties": { + "parent_object": { + "type": "object", + "properties": { + "object_array": { + "type": "array", + "items": [ + { "$ref": "WellKnownTypes.json#/definitions/Integer" }, + { "$ref": "WellKnownTypes.json#/definitions/Boolean" }, + { + "type": "object", + "properties": { + "id": { + "type": "object", + "properties": { + "id_part_1": { + "$ref": "WellKnownTypes.json#/definitions/Integer" + }, + "id_part_2": { + "$ref": "WellKnownTypes.json#/definitions/String" + } + } + } + } + }, + { + "type": "object", + "properties": { + "id": { + "type": "object", + "properties": { + "id_part_1": { + "$ref": "WellKnownTypes.json#/definitions/String" + }, + "id_part_2": { + "$ref": "WellKnownTypes.json#/definitions/Integer" + } + } + }, + ":message": { + "$ref": "WellKnownTypes.json#/definitions/String" + } + } + } + ] + } + } + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"parent_object\"" : "row(object_array array(row(member0 bigint, member1 boolean, member2 row(id row(id_part_1 row(member0 bigint, member1 varchar), id_part_2 row(member0 varchar, member1 bigint), _airbyte_additional_properties map(varchar, varchar)), _message varchar, _airbyte_additional_properties map(varchar, varchar)))), _airbyte_additional_properties map(varchar, varchar))", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "object_inside_array_inside_array", + "jsonSchema": { + "type": "object", + "properties": { + "filters": { + "type": "array", + "items": { + "type": "array", + "items": { + "type": "object", + "properties": { + "filterFamily": { + "$ref": "WellKnownTypes.json#/definitions/String" + } + } + } + } + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"filters\"" : "array(array(row(filterFamily varchar, _airbyte_additional_properties map(varchar, varchar))))", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "array_field_with_empty_items", + "jsonSchema": { + "type": "object", + "properties": { + "array_field": { + "type": "array", + "items": {} + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"array_field\"" : "array(varchar)", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "nullable_value", + "jsonSchema": { + "type": "object", + "properties": { + "node_id": { + "$ref": "WellKnownTypes.json#/definitions/Number" + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"node_id\"" : "double", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "any_of_with_same_name", + "jsonSchema": { + "type": "object", + "properties": { + "same_record_name_field": { + "type": "object", + "properties": { + "sub_field_1": { + "$ref": "WellKnownTypes.json#/definitions/String" + } + } + }, + "any_of_field": { + "anyOf": [ + { + "type": "object", + "properties": { + "same_record_name_field": { + "type": "object", + "properties": { + "sub_field_2": { + "$ref": "WellKnownTypes.json#/definitions/String" + } + } + } + } + } + ] + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"same_record_name_field\"" : "row(sub_field_1 varchar, _airbyte_additional_properties map(varchar, varchar))", + "\"any_of_field\"" : "row(same_record_name_field row(sub_field_2 varchar, _airbyte_additional_properties map(varchar, varchar)), _airbyte_additional_properties map(varchar, varchar))", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "all_of_with_same_name", + "jsonSchema": { + "type": "object", + "properties": { + "same_record_name_field": { + "type": "object", + "properties": { + "sub_field_1": { + "$ref": "WellKnownTypes.json#/definitions/String" + } + } + }, + "all_of_field": { + "allOf": [ + { + "type": "object", + "properties": { + "same_record_name_field": { + "type": "object", + "properties": { + "sub_field_2": { + "$ref": "WellKnownTypes.json#/definitions/String" + } + } + } + } + } + ] + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"same_record_name_field\"" : "row(sub_field_1 varchar, _airbyte_additional_properties map(varchar, varchar))", + "\"all_of_field\"" : "row(same_record_name_field row(sub_field_2 varchar, _airbyte_additional_properties map(varchar, varchar)), _airbyte_additional_properties map(varchar, varchar))", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "one_of_with_same_name", + "jsonSchema": { + "type": "object", + "properties": { + "same_record_name_field": { + "type": "object", + "properties": { + "sub_field_1": { + "$ref": "WellKnownTypes.json#/definitions/String" + } + } + }, + "any_of_field": { + "anyOf": [ + { + "type": "object", + "properties": { + "same_record_name_field": { + "type": "object", + "properties": { + "sub_field_2": { + "$ref": "WellKnownTypes.json#/definitions/String" + } + } + } + } + } + ] + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"same_record_name_field\"" : "row(sub_field_1 varchar, _airbyte_additional_properties map(varchar, varchar))", + "\"any_of_field\"" : "row(same_record_name_field row(sub_field_2 varchar, _airbyte_additional_properties map(varchar, varchar)), _airbyte_additional_properties map(varchar, varchar))", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + }, + { + "schemaName": "other_types", + "jsonSchema": { + "type": "object", + "properties": { + "string_type": { + "$ref": "WellKnownTypes.json#/definitions/String" + }, + "integer_type": { + "$ref": "WellKnownTypes.json#/definitions/Integer" + }, + "number_type": { + "$ref": "WellKnownTypes.json#/definitions/Number" + }, + "boolean_type": { + "$ref": "WellKnownTypes.json#/definitions/Boolean" + }, + "binary_type": { + "$ref": "WellKnownTypes.json#/definitions/BinaryData" + }, + "date_type": { + "$ref": "WellKnownTypes.json#/definitions/Date" + }, + "timestamp_with_tz_type": { + "$ref": "WellKnownTypes.json#/definitions/TimestampWithTimezone" + }, + "timestamp_without_tz_type": { + "$ref": "WellKnownTypes.json#/definitions/TimestampWithoutTimezone" + }, + "time_with_tz_type": { + "$ref": "WellKnownTypes.json#/definitions/TimeWithTimezone" + }, + "time_without_tz_type": { + "$ref": "WellKnownTypes.json#/definitions/TimeWithoutTimezone" + } + } + }, + "galaxyIcebergSchema" : { + "\"_airbyte_ab_id\"" : "varchar", + "\"_airbyte_emitted_at\"" : "timestamp(6) with time zone", + "\"string_type\"" : "varchar", + "\"integer_type\"" : "bigint", + "\"number_type\"" : "double", + "\"boolean_type\"" : "boolean", + "\"binary_type\"" : "varbinary", + "\"date_type\"" : "date", + "\"timestamp_with_tz_type\"" : "timestamp(6) with time zone", + "\"timestamp_without_tz_type\"" : "timestamp(6) with time zone", + "\"time_with_tz_type\"" : "varchar", + "\"time_without_tz_type\"" : "time(6)", + "\"_airbyte_additional_properties\"" : "map(varchar, varchar)" + } + } +] diff --git a/connectors.md b/connectors.md index e4662f10c3447..bc45abcc26dfe 100644 --- a/connectors.md +++ b/connectors.md @@ -341,6 +341,7 @@ | **Scylla** | Scylla icon | Destination | airbyte/destination-scylla:0.1.3 | alpha | [docs](https://docs.airbyte.com/integrations/destinations/scylla) | [connectors/destination/scylla](https://github.com/airbytehq/airbyte/issues?q=is:open+is:issue+label:connectors/destination/scylla) | [destination-scylla](https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/destination-scylla) | `3dc6f384-cd6b-4be3-ad16-a41450899bf0` | | **SelectDB** | SelectDB icon | Destination | airbyte/destination-selectdb:0.1.0 | alpha | [docs](https://docs.airbyte.com/integrations/destinations/selectdb) | [connectors/destination/selectdb](https://github.com/airbytehq/airbyte/issues?q=is:open+is:issue+label:connectors/destination/selectdb) | [destination-selectdb](https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/destination-selectdb) | `50a559a7-6323-4e33-8aa0-51dfd9dfadac` | | **Snowflake** | Snowflake icon | Destination | airbyte/destination-snowflake:0.4.61 | generally_available | [docs](https://docs.airbyte.com/integrations/destinations/snowflake) | [connectors/destination/snowflake](https://github.com/airbytehq/airbyte/issues?q=is:open+is:issue+label:connectors/destination/snowflake) | [destination-snowflake](https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/destination-snowflake) | `424892c4-daac-4491-b35d-c6688ba547ba` | +| **Starburst Galaxy** | Starburst Galaxy icon | Destination | airbyte/destination-starburst-galaxy:0.0.1 | alpha | [docs](https://docs.airbyte.com/integrations/destinations/starburst-galaxy) | [connectors/destination/starburst-galaxy](https://github.com/airbytehq/airbyte/issues?q=is:open+is:issue+label:connectors/destination/starburst-galaxy) | [destination-starburst-galaxy](https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/destination-starburst-galaxy) | `4528e960-6f7b-4412-8555-7e0097e1da17` | | **Streamr** | Streamr icon | Destination | ghcr.io/devmate-cloud/streamr-airbyte-connectors:0.0.1 | alpha | [docs](https://docs.airbyte.com/integrations/destinations/streamr) | [connectors/destination/devmate-cloud](https://github.com/airbytehq/airbyte/issues?q=is:open+is:issue+label:connectors/destination/devmate-cloud) | [devmate-cloud](https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/devmate-cloud) | `eebd85cf-60b2-4af6-9ba0-edeca01437b0` | | **Teradata Vantage** | Teradata Vantage icon | Destination | airbyte/destination-teradata:0.1.1 | alpha | [docs](https://docs.airbyte.io/integrations/destinations/teradata) | [connectors/destination/teradata](https://github.com/airbytehq/airbyte/issues?q=is:open+is:issue+label:connectors/destination/teradata) | [destination-teradata](https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/destination-teradata) | `58e6f9da-904e-11ed-a1eb-0242ac120002` | | **TiDB** | TiDB icon | Destination | airbyte/destination-tidb:0.1.1 | alpha | [docs](https://docs.airbyte.com/integrations/destinations/tidb) | [connectors/destination/tidb](https://github.com/airbytehq/airbyte/issues?q=is:open+is:issue+label:connectors/destination/tidb) | [destination-tidb](https://github.com/airbytehq/airbyte/tree/master/airbyte-integrations/connectors/destination-tidb) | `06ec60c7-7468-45c0-91ac-174f6e1a788b` | diff --git a/deps.toml b/deps.toml index 999cbe930f6f4..5b3f23e50d0a9 100644 --- a/deps.toml +++ b/deps.toml @@ -111,7 +111,7 @@ postgresql = { module = "org.postgresql:postgresql", version.ref = "postgresql" quartz-scheduler = { module = "org.quartz-scheduler:quartz", version = "2.3.2" } reactor-core = { module = "io.projectreactor:reactor-core", version.ref = "reactor" } reactor-test = { module = "io.projectreactor:reactor-test", version.ref = "reactor" } -s3 = { module = "software.amazon.awssdk:s3", version = "2.16.84" } +s3 = { module = "software.amazon.awssdk:s3", version = "2.20.20" } segment-java-analytics = { module = "com.segment.analytics.java:analytics", version.ref = "segment" } slf4j-api = { module = "org.slf4j:slf4j-api", version.ref = "slf4j" } spotbugs-annotations = { module = "com.github.spotbugs:spotbugs-annotations", version = "4.7.3" } diff --git a/docs/integrations/destinations/starburst-galaxy.md b/docs/integrations/destinations/starburst-galaxy.md new file mode 100644 index 0000000000000..710cbb2f0883e --- /dev/null +++ b/docs/integrations/destinations/starburst-galaxy.md @@ -0,0 +1,101 @@ +# Starburst Galaxy destination user guide + +## Overview + +The Starburst Galaxy destination syncs data to Starburst Galaxy [great lake catalogs](https://docs.starburst.io/starburst-galaxy/sql/great-lakes.html) +in [Apache Iceberg](https://iceberg.apache.org/) table format. Each stream is written to its own Iceberg table. + +## Features + +| Feature | Supported | Notes | +|:----------------|:---------:|:------------------------------------------------------------------------------------| +| Overwrite Sync | ✅ | **Warning**: this mode deletes all previously synced data in the destination table. | +| Append Sync | ✅ | | +| Deduped History | ❌ | | +| Namespaces | ✅ | | +| SSL | ✅ | SSL is enabled. | + +## Data storage + +Starburst Galaxy supports various [object storages](https://docs.starburst.io/starburst-galaxy/catalogs/index.html#object-storage); +however, only Amazon S3 is supported by this connector. + +## Configuration + +| Category | Parameter | Type | Notes | +|:---------------------------------|:------------------------------|:-------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Starburst Galaxy | `Hostname` | string | Required. Located in the **Connection info** section of the [view clusters](https://docs.starburst.io/starburst-galaxy/clusters/index.html#manage-clusters) pane in Starburst Galaxy. | +| | `Port` | string | Optional. Located in the **Connection info** section of the [view clusters](https://docs.starburst.io/starburst-galaxy/clusters/index.html#manage-clusters) pane in Starburst Galaxy. Defaults to `443`. | +| | `User` | string | Required. Galaxy user found in the **Connection info** section of the [view clusters](https://docs.starburst.io/starburst-galaxy/clusters/index.html#manage-clusters) pane in Starburst Galaxy. | +| | `Password` | string | Required. Password for the specified Galaxy user. | +| | `Amazon S3 catalog` | string | Required. Name of the [Amazon S3 catalog](https://docs.starburst.io/starburst-galaxy/catalogs/s3.html) created in the Galaxy domain. | +| | `Amazon S3 catalog schema` | string | Optional. The default Starburst Galaxy Amazon S3 catalog schema where tables are written to if the source does not specify a namespace. Each data stream is written to a table in this schema. Defaults to `public`. | +| Staging Object Store - Amazon S3 | `Bucket name` | string | Required. Name of the bucket where the staging data is stored. | +| | `Bucket path` | string | Required. Sets the subdirectory of the specified S3 bucket used for storing staging data. | +| | `Bucket region` | string | Required. Sets the region of the specified S3 bucket. | +| | `Access key` | string | Required. AWS/Minio credential. | +| | `Secret key` | string | Required. AWS/Minio credential. | +| General | `Purge staging Iceberg table` | boolean | Optional. Indicates that staging Iceberg table is purged after a data sync is complete. Enabled by default. Disable it for debugging purposes only. | + +## Staging files + +### S3 + +Data streams are written to a temporary Iceberg table, and then loaded into Amazon S3 Starburst Galaxy catalog in the Iceberg table format. +Staging table is deleted after a sync is complete if the `Purge staging Iceberg table` is enabled. +The following is an example of a full path for a staging file: + +```text +s3:///// +``` + +For example: + +```text +s3://galaxy_bucket/data_output_path/test_schema/_airbyte_tmp_qey_user + ↑ ↑ ↑ ↑ + | | | temporary Iceberg table holding data + | | source namespace or provided schema name + | | + | bucket path + bucket name +``` + +## Target Iceberg SQL table + +Streams are synced in the Starburst Galaxy Amazon S3 catalog with Iceberg table format. + +## Output schema + +Each table in the output schema has the following columns: + +| Column | Type | Description | +|:--------------------------------------------------------------|:---------------------:|:-----------------------------------------------------------------------------------------------------| +| `_airbyte_ab_id` | varchar | UUID. | +| `_airbyte_emitted_at` | timestamp(6) | Data emission timestamp. | +| Data fields from the source stream | various | All the fields from the source stream will be populated as an individual column in the target table. | +| `_airbyte_additional_properties` | map(varchar, varchar) | Additional properties. | + +The Airbyte data stream's JSON schema is converted to an Avro schema. The JSON object is then converted to an Avro record; +the Avro record is written to a staging Iceberg table. As the data stream can be generated from any data source, +the JSON-to-Avro conversion process has arbitrary rules and limitations. +Learn more about [how source data is converted to Avro](https://docs.airbyte.io/understanding-airbyte/json-avro-conversion). + +### Datatype support + +Learn more about [Starburst Galaxy Iceberg type mapping](https://docs.starburst.io/latest/connector/iceberg.html#iceberg-to-trino-type-mapping). + +## Getting started + +### Requirements + +- [Starburst Galaxy cluster](https://docs.starburst.io/starburst-galaxy/clusters/index.html). Required credentials are found in the **Connection info** section of the [view clusters](https://docs.starburst.io/starburst-galaxy/clusters/index.html#manage-clusters) page +- A [Starburst Galaxy S3 catalog](https://docs.starburst.io/starburst-galaxy/catalogs/s3.html) created within the Galaxy domain, and [attached to a running cluster](https://docs.starburst.io/starburst-galaxy/catalogs/index.html#add-a-catalog-to-a-cluster). +- [Credentials for S3 bucket](https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys). +- Grant S3 bucket [location privileges](https://docs.starburst.io/starburst-galaxy/security/privileges.html#location-privileges-) to the role user is assigned to. + +## Changelog + +| Version | Date | Pull Request | Subject | +|:--------|:-----------|:-----------------------------------------------------------|:------------------------| +| 0.0.1 | 2023-03-28 | [\#24620](https://github.com/airbytehq/airbyte/pull/24620) | Initial public release. |