Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arrow-flight connector #23032

Closed
wants to merge 1 commit into from
Closed

Conversation

sabbasani
Copy link
Contributor

@sabbasani sabbasani commented Jun 19, 2024

Description

Motivation and Context

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our development, formatting, commit message, and attribution guidelines.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Add Arrow Flight connector :pr:`23032`
* Add documentation for the :doc:`/connector/base-arrow-flight`  :pr:`23032`

If release note is NOT required, use:

== NO RELEASE NOTE ==

 

@sabbasani sabbasani requested a review from a team as a code owner June 19, 2024 11:23
@sabbasani sabbasani requested a review from presto-oss June 19, 2024 11:23
@sabbasani sabbasani marked this pull request as draft June 19, 2024 11:42
@steveburnett
Copy link
Contributor

Consider adding documentation for the new connector.

Suggest revising the release note entry to follow the Release Note Guidelines:

== RELEASE NOTES ==

General Changes
* Add Arrow Flight connector :pr:`23032`

@tdcmeehan tdcmeehan self-assigned this Jul 15, 2024
Copy link
Contributor

@tdcmeehan tdcmeehan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like there's a hardcoded presumption that the underlying datasource accepts a SQL query. Can you remove this from the PR? The service may not accept SQL.

public List<Field> getColumnsList(String schema, String table, ConnectorSession connectorSession)
{
try {
String dbSpecificSchemaName = getDBSpecificSchemaName(config, schema);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid "DB" references as it might not be an underlying DB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed "DB" references

Comment on lines 35 to 36
@JsonProperty("jdbcType") int jdbcType,
@JsonProperty("jdbcTypeName") String jdbcTypeName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't have references to JDBC in this connector.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed JDBC re references

@tdcmeehan tdcmeehan requested a review from BryanCutler July 16, 2024 14:55
@steveburnett
Copy link
Contributor

Suggest change to the release note entry as follows:

== RELEASE NOTES ==

General Changes
* Add Arrow Flight connector :pr:`23032`

The documentation for Arrow Flight Connector appears to be being added in #23212 , so it doesn't need to be mentioned in this release note.

Copy link
Contributor

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a quick pass and had a few comments. I'll take another look at the rest a little later.

<air.main.basedir>${project.parent.basedir}</air.main.basedir>
<grpc.version>1.53.0</grpc.version>
<dep.okhttp.version>4.10.0</dep.okhttp.version>
<arrow.version>11.0.0</arrow.version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fairly old version of Arrow, can you use a more recent one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BryanCutler I have updated to leastest arrow version

presto-base-arrow-flight/pom.xml Outdated Show resolved Hide resolved
return;
}
try {
RootAllocator allocator = new RootAllocator(Long.MAX_VALUE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would usually create a RootAllocator as a class member and they should be closed when not used anymore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BryanCutler I have made changes for closing allocator

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not re-using the flight client. The root allocator will be closed when ArrowFlightClient is closed or auto closed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this RootAllocator closed then? The FlightClient creates a child allocator that is closed along with the client, it will not close the root

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image image

ArrowFlightClient saves the root allocator. The root allocator is closed when ArrowFlightClient is closed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You wouldn't normally create a RootAllocator for each client, it could be reused as each client will internally make a child allocator. It's ok to clean that up later, since you are closing it here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would we determine when the root allocator should be closed? Should we keep track of open flight clients and close root allocator when there are no flight clients open for some duration of time?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would create 1 RootAllocator for the connector and then close it during Connector.shutdown(). The FlightClient creates it's own child allocator that is closed with the client, so the root won't be keeping any buffers directly.

trustedCertificate.get().close();
}
shutdownTimer();
isClientClosed.set(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the calls to getClient() and close() need to be thread-safe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BryanCutler I have addressed comments can please review changes

Copy link

linux-foundation-easycla bot commented Jul 17, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the base-arrow-flight.rst documentation in #23212. If this documentation is to be included in this PR - which I think would be a good idea to do this - please address my comments in my most recent review in #23212 here.

@sabbasani sabbasani force-pushed the arrow-connector branch 4 times, most recently from a69b92e to 7bd4ec9 Compare July 24, 2024 13:38
import static java.util.Objects.requireNonNull;
import static java.util.stream.Collectors.joining;

public class ArrowQueryBuilder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might better belong in a submodule that depends on this module which implements the Flight SQL spec. I don't think it belongs here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdcmeehan I have addressed comments can please review changes

@sabbasani sabbasani force-pushed the arrow-connector branch 8 times, most recently from 04cdc53 to 824175b Compare July 25, 2024 21:05
@sabbasani
Copy link
Contributor Author

sabbasani commented Jul 25, 2024

@tdcmeehan @BryanCutler @steveburnett
we have addressed all comments above.Can you please review

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the docs! Minor suggestions, mostly formatting.

A local doc build returns the following warning:

/Users/steveburnett/Documents/GitHub/presto/presto-docs/src/main/sphinx/connector/base-arrow-flight.rst: WARNING: document isn't included in any toctree

To address this warning,

@steveburnett
Copy link
Contributor

Suggest update of the release note to include the PR number in both entries, and to link to the new doc from the release note.

== RELEASE NOTES ==

General Changes
* Add Arrow Flight connector :pr:`23032`
* Add documentation for the :doc:`/connector/base-arrow-flight`  :pr:`23032`

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for including the doc from #23212 here! A few minor suggestions, looks good.

---------------------------------------------------------------------------------
To create a plugin extending the base-arrow-module, you need to implement certain abstract methods that are specific to your use case. Below are the required classes and their purposes:

* ``AbstractArrowFlightClientHandler.java``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lithinwxd Can you update these instructions to match the latest code changes?

@elbinpallimalilibm elbinpallimalilibm force-pushed the arrow-connector branch 5 times, most recently from 3cb87b6 to ed3d407 Compare January 4, 2025 16:45
---------------------------------------------------------------------------------
To create a plugin extending the presto-base-arrow-flight module, you need to implement certain abstract methods that are specific to your use case. Below are the required classes and their purposes:

- ``BaseArrowFlightClient.java``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steveburnett We have updated the documentation to match the latest code base. Can you review?

@elbinpallimalilibm elbinpallimalilibm force-pushed the arrow-connector branch 3 times, most recently from f763a93 to 3dcded8 Compare January 6, 2025 06:29
steveburnett
steveburnett previously approved these changes Jan 6, 2025
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull updated branch, new local doc build, looks good. Thanks!

}
else {
String value = new String(vector.get(i), StandardCharsets.UTF_8);
type.writeSlice(builder, Slices.utf8Slice(value));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you create a Slice directly from the raw UTF-8 bytes from vector.get(i) using Slice wrappedBuffer(byte... array)? That would be more efficient that copying to a String first.

elementBuilder.appendNull();
}
else {
appendValueToBuilder(elementType, elementBuilder, value);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just call buildBlockFromValueVector with the element vector instead of making a new function to append Objects?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildBlockFromValueVector gives a block, but here we want to append multiple values to the same block.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of additional code to deal with writing elements and it's poor performance to call reader.readObject() for each element of the list.

I think what you could do is make BlockBuilder an argument instead of creating it in each function, then build() after calling the function

void buildBlockFromValueVector(ValueVector vector, Type type, BlockBuilder builder)

then in buildBlockFromListVector you would call this function with the element vector and elementBuilder, for example

buildBlockFromValueVector(vector.getDataVector(), elementType, elementBuilder);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vector.getDataVector() gives all the values from the list, not just the element vector. Are you aware of a way to get the element vector.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


import static java.util.Objects.requireNonNull;

public class ClientClosingFlightStream
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this class is necessary. You shouldn't lump a client and stream together, just keep them separate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdcmeehan is it okay for you to decouple client and stream ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we need to close the client and stream together, it's convenient for encapsulation to put this in one type (otherwise the user of the BaseFlightClient needs to supply a client, which needs to be closed separately). What about a more generic AutoCloseableSupplier or something along those lines?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made this comment because the Flight Server dictates how many streams a requested flight will have, and it could be multiple. The client would normally process all streams, closing each one independently. I guess if we are assuming there will always be 1 stream only for the given client, this is ok. But if that ever changes or the client needs to perform some additional actions, then this wouldn't be useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This remains true. Because streams are opened on a per-split basis, the concurrency is being managed by the Presto evaluation engine, and we can safely make the presumption that there will be 1 client per split, 1 stream per client per split.

Co-authored-by: sai bhaskar reddy <[email protected]>
Co-authored-by: SthuthiGhosh9400 <[email protected]>
Co-authored-by: lithinwxd <[email protected]>
Co-authored-by: Steve Burnett <[email protected]>
Co-authored-by: elbinpallimalilibm <[email protected]>
Co-authored-by: Tim Meehan <[email protected]>
@ethanyzhang
Copy link
Contributor

Hi @sabbasani @elbinpallimalilibm based on the discussion we had with Shweta and Sandhya, we are gonna take over this work, @BryanCutler will help move this PR over the finishing line. We keep everyone who worked on this PR as co-authors. Thank you. cc @tdcmeehan

@JsonProperty("schema") String schema,
@JsonProperty("table") String table)
{
this.schema = schema;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BryanCutler We are testing table valued functions with an extension of this connector and there is a use case where the PTVF won't have a table and schema associated with it. So we need to consider such a use case when designing the table handle for this connector.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain this use case more? I don't quite understand how a TableHandle would not have a table.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DM'd you the details.

@JsonProperty("schema") String schema,
@JsonProperty("table") String table)
{
this.schema = schema;
Copy link

@Ajas-Mangal Ajas-Mangal Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Optional and provide appropriate constructor argument validation for schema and table.

Eg:
private final Optional <String> schema;
and
this. schema = requireNonNull(schema, "schema is null");

Copy link
Contributor

@BryanCutler BryanCutler Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reiterate, the ArrowConnector implementation controls the value of schema and table here, so it could set them to a unique identifier even if they are not necessarily needed as you say. I'm not sure it's correct to make these properties optional, and I don't see that done in any other connector. @tdcmeehan can correct me if I'm mistaken.

@BryanCutler
Copy link
Contributor

Continuing this PR at #24427

<dependency>
<groupId>io.perfmark</groupId>
<artifactId>perfmark-api</artifactId>
<version>${perfmark-api.version}</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elbinpallimalilibm the managed dependencies for perfmark-api and error_prone_annotations don't seem to be needed anymore, was there a specific reason they were added?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were seeing upper bound errors on build without mentioning the dependencies in the POM. Is that error not happening now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not seeing any errors now, looks like error prone version has been upgraded in root pom

@BryanCutler
Copy link
Contributor

#24427 has been merged, will close this one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
from:IBM PR from IBM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants