[ENHANCMENT] Support for loading multiple tables #557

williammatherjones · 2024-03-25T18:16:12Z

Environment

Spark version:
Hadoop version:
Vertica version: 12.0.4-x
Vertica Spark Connector version: 3.3.3
Java version:
Additional Environment Information:

Problem Description

Describe the issue in as much details as possible, so it is possible to reproduce it.

The Spark connector is instantiating only one parallel JDBC connection. When one table completes its data loading, it closes the connection. Since the JDBC connection is defined as singleton in the code, it prevents other connections for clerical tasks such as table/column definition check. In order to work with this configuration, the connector will need to be enhanced to handle multiple threads.

Steps to reproduce:

Here is what we understood about how customer's job run -

Use kafka to write data in aws s3 files.
Then customer's code is submitted to a spark shell to read these files from s3, perform a few transformations.
This transformed data is then written to vertica using vertica spark connector.

Customer's code is having the ability to run the load and transform for multiple tables. and they claim not facing above issue when they used the vertica's legacy spark connector (when they used vertica 9.1.x).

Expected behaviour:

In our tests, we found out below observations-

The customer never faces an issue if they run code for a single table.
The spark job fails if submitting the code for multiple tables
Actual behaviour:
Error message/stack trace:
Code sample or example on how to reproduce the issue:

Spark Connector Logs

Add related logs entries here.

williammatherjones added bug Something isn't working enhancement New feature or request and removed bug Something isn't working labels Mar 25, 2024

williammatherjones changed the title ~~[BUG] vertica spark - mismatch with existing table~~ [ENHANCMENT] Support for loading multiple tables Mar 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENHANCMENT] Support for loading multiple tables #557

[ENHANCMENT] Support for loading multiple tables #557

williammatherjones commented Mar 25, 2024 •

edited

Loading

[ENHANCMENT] Support for loading multiple tables #557

[ENHANCMENT] Support for loading multiple tables #557

Comments

williammatherjones commented Mar 25, 2024 • edited Loading

Environment

Problem Description

Steps to reproduce:

Expected behaviour:

Spark Connector Logs

williammatherjones commented Mar 25, 2024 •

edited

Loading