[BUG] Spark connector leaves multiple sessions open #314

jeremyprime · 2022-01-28T19:29:32Z

Environment

Vertica Spark Connector version: 3.0.1

Problem Description

When performing a read or write operation using the Spark Connector, it creates one or more sessions in Vertica. These sessions are not cleaned up until the connection is closed and the client application exits.

For example, the pyspark example creates 3 sessions, 1 for write and 2 for read. These remain until the python script completes.

We likely need to close these sessions when an operation completes since we do not appear to reuse a given session id. That way, even if multiple operations are performed in a single script/client, the sessions should be closed without having to close the client application.

To get the session information, or close all open sessions, run the following:

docker exec -it docker_vertica_1 vsql

-- Get all sessions
SELECT session_id, node_name, user_name, client_hostname, login_timestamp, statement_start, current_statement, last_statement, client_type, client_os FROM v_monitor.sessions ORDER BY user_name;

-- Close all sessions
SELECT close_all_sessions();

Spark Connector Logs

The text was updated successfully, but these errors were encountered:

jeremyprime · 2022-02-04T17:43:03Z

I have run a number of example applications in both Python and Scala. This was done using sbt run, submitting in Spark, and using Jupyter Notebook (also Spark).

For each operation (read or write) one or more sessions will be created in Vertica. These sessions are then closed when the application itself exits. The only time when sessions persist is when I use an application that is long-running, such as Jupyter Notebook, where the kernel continues to run even after execution of the code has completed. When the Jupyter Notebook kernel is shutdown, the Vertica sessions are closed.

Although, since we close the connection after each operation, we might expect the session to also be closed even before the application/client exits. According to this page, a session may persist for a few different reasons, such as if a transaction is not committed. Will need to investigate if we are not correctly committing transactions or closing connections in some cases.

jeremyprime · 2022-02-04T20:12:33Z

May be related to #171 (planInputPartitions being called twice).

jeremyprime · 2022-02-07T17:06:58Z

Note that there was an internal ticket, VER-76615, that tried to address this in the past. The test code on that ticket (simply performed a write in a loop, 100 times) still exhausts the number of available sessions when run against the main branch:

java.sql.SQLNonTransientConnectionException: [Vertica][VJDBC](4060) FATAL: New session rejected due to limit, already 105 sessions active

The SQL queries in SchemaTools.getColumnInfo and TableUtils.tableExists leave open sessions when called. There may be other queries as well, these are just the ones that are seen with the current test code. These sessions keep building up and are not closed until the application exits.

jeremyprime · 2022-02-08T22:34:10Z

The SQL connection to Vertica is the source of the dangling sessions. The connection is created under VerticaJdbcLayer.VerticaJdbcLayer(). This JDBC layer is currently created once for reads, VerticaPipeFctory.getReadPipe(), and writes, VerticaPipeFctory.getWritePipe(). However, there are multiple places where the read and write pipes are created, and thus multiple places where connections are created.

The creation of the connection and/or the creation of the JDBC layer needs to be a singleton. However, this also complicates closing of the connection as other parts of the code will try to reuse the closed connection.

Perhaps connection pooling would help to better manage minimal connections across reads and writes, and all of the underlying queries they perform. It would be ideal to create and reuse a single connection, provided that it does not hang around when the operations are complete (such as when performing long-lived operations in Jupyter Notebook).

jeremyprime · 2022-02-09T19:49:09Z

Recreating the JDBC layer if the child connection is closed seems to address the issue of multiple and dangling sessions. Although there is still some connection closing and reopening performed by the connector, which is not ideal.

#324 has been created to address this issue.

* Use singletons for the read and write pipes * Recreate the JDBC layer if the connection was closed * Remove commented code * Remove semicolons (#314) * Handle exception when checking for closed connection (#314) * Refactored JDBC layer check (#314)

* Use singletons for the read and write pipes * Recreate the JDBC layer if the connection was closed * Remove commented code * Remove semicolons (#314) * Handle exception when checking for closed connection (#314) * Refactored JDBC layer check (#314) * Added integration test for session handling (#314) * Close read connection after write (#314) * Try sleeping to ensure sessions are released (#314) * Increase sleep (#314) * Refactored JDBC layer close (#314) * Poll session count instead of sleeping (#314) * Refactor session polling (#314) * Only look for Spark connector sessions (#314) * Ignore test as it sometimes fails on GitHub (#314)

jeremyprime added the bug Something isn't working label Jan 28, 2022

jeremyprime self-assigned this Jan 28, 2022

jeremyprime added the High Priority label Jan 28, 2022

jonathanl-bq added the size: 3 label Jan 28, 2022

jeremyprime mentioned this issue Feb 9, 2022

Use a single JDBC layer per read/write #323

Merged

jeremyprime added a commit that referenced this issue Feb 9, 2022

Remove semicolons (#314)

98b5621

jeremyprime added a commit that referenced this issue Feb 9, 2022

Handle exception when checking for closed connection (#314)

71b1e9c

jeremyprime added a commit that referenced this issue Feb 10, 2022

Refactored JDBC layer check (#314)

881d329

jeremyprime mentioned this issue Feb 10, 2022

[BUG] Spark connector opens and closes multiple connections per operation #324

Open

Aryex closed this as completed in #323 Feb 10, 2022

jeremyprime added a commit that referenced this issue Feb 10, 2022

Added integration test for session handling (#314)

880699a

jeremyprime added a commit that referenced this issue Feb 10, 2022

Close read connection after write (#314)

cd86d87

jeremyprime added a commit that referenced this issue Feb 11, 2022

Try sleeping to ensure sessions are released (#314)

e42db8d

jeremyprime added a commit that referenced this issue Feb 11, 2022

Increase sleep (#314)

054711a

jeremyprime added a commit that referenced this issue Feb 11, 2022

Refactored JDBC layer close (#314)

c76c34a

jeremyprime added a commit that referenced this issue Feb 11, 2022

Poll session count instead of sleeping (#314)

b3d9b51

jeremyprime added a commit that referenced this issue Feb 11, 2022

Refactor session polling (#314)

231704b

jeremyprime added a commit that referenced this issue Feb 11, 2022

Only look for Spark connector sessions (#314)

3126702

jeremyprime added a commit that referenced this issue Feb 11, 2022

Ignore test as it sometimes fails on GitHub (#314)

89909e5

jeremyprime mentioned this issue Jun 3, 2022

[REFACTOR] Remove JDBC reference from executor code #403

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Spark connector leaves multiple sessions open #314

[BUG] Spark connector leaves multiple sessions open #314

jeremyprime commented Jan 28, 2022

jeremyprime commented Feb 4, 2022 •

edited

Loading

jeremyprime commented Feb 4, 2022

jeremyprime commented Feb 7, 2022

jeremyprime commented Feb 8, 2022

jeremyprime commented Feb 9, 2022 •

edited

Loading

[BUG] Spark connector leaves multiple sessions open #314

[BUG] Spark connector leaves multiple sessions open #314

Comments

jeremyprime commented Jan 28, 2022

Environment

Problem Description

Spark Connector Logs

jeremyprime commented Feb 4, 2022 • edited Loading

jeremyprime commented Feb 4, 2022

jeremyprime commented Feb 7, 2022

jeremyprime commented Feb 8, 2022

jeremyprime commented Feb 9, 2022 • edited Loading

jeremyprime commented Feb 4, 2022 •

edited

Loading

jeremyprime commented Feb 9, 2022 •

edited

Loading