Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENHANCMENT] Support for loading multiple tables #557

Open
williammatherjones opened this issue Mar 25, 2024 · 0 comments
Open

[ENHANCMENT] Support for loading multiple tables #557

williammatherjones opened this issue Mar 25, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@williammatherjones
Copy link

williammatherjones commented Mar 25, 2024

Environment

  • Spark version:
  • Hadoop version:
  • Vertica version: 12.0.4-x
  • Vertica Spark Connector version: 3.3.3
  • Java version:
  • Additional Environment Information:

Problem Description

Describe the issue in as much details as possible, so it is possible to reproduce it.

The Spark connector is instantiating only one parallel JDBC connection. When one table completes its data loading, it closes the connection. Since the JDBC connection is defined as singleton in the code, it prevents other connections for clerical tasks such as table/column definition check. In order to work with this configuration, the connector will need to be enhanced to handle multiple threads.

Steps to reproduce:

Here is what we understood about how customer's job run -

  1. Use kafka to write data in aws s3 files.
  2. Then customer's code is submitted to a spark shell to read these files from s3, perform a few transformations.
  3. This transformed data is then written to vertica using vertica spark connector.

Customer's code is having the ability to run the load and transform for multiple tables. and they claim not facing above issue when they used the vertica's legacy spark connector (when they used vertica 9.1.x).

Expected behaviour:

In our tests, we found out below observations-

  1. The customer never faces an issue if they run code for a single table.
  2. The spark job fails if submitting the code for multiple tables
  3. Actual behaviour:
  4. Error message/stack trace:
  5. Code sample or example on how to reproduce the issue:

Spark Connector Logs

  • Add related logs entries here.
@williammatherjones williammatherjones added bug Something isn't working enhancement New feature or request and removed bug Something isn't working labels Mar 25, 2024
@williammatherjones williammatherjones changed the title [BUG] vertica spark - mismatch with existing table [ENHANCMENT] Support for loading multiple tables Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant