-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arrow-flight connector #23032
Arrow-flight connector #23032
Conversation
Consider adding documentation for the new connector. Suggest revising the release note entry to follow the Release Note Guidelines:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like there's a hardcoded presumption that the underlying datasource accepts a SQL query. Can you remove this from the PR? The service may not accept SQL.
public List<Field> getColumnsList(String schema, String table, ConnectorSession connectorSession) | ||
{ | ||
try { | ||
String dbSpecificSchemaName = getDBSpecificSchemaName(config, schema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid "DB" references as it might not be an underlying DB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed "DB" references
@JsonProperty("jdbcType") int jdbcType, | ||
@JsonProperty("jdbcTypeName") String jdbcTypeName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't have references to JDBC in this connector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed JDBC re references
Suggest change to the release note entry as follows:
The documentation for Arrow Flight Connector appears to be being added in #23212 , so it doesn't need to be mentioned in this release note. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a quick pass and had a few comments. I'll take another look at the rest a little later.
presto-base-arrow-flight/pom.xml
Outdated
<air.main.basedir>${project.parent.basedir}</air.main.basedir> | ||
<grpc.version>1.53.0</grpc.version> | ||
<dep.okhttp.version>4.10.0</dep.okhttp.version> | ||
<arrow.version>11.0.0</arrow.version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fairly old version of Arrow, can you use a more recent one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BryanCutler I have updated to leastest arrow version
presto-base-arrow-flight/src/main/java/com/facebook/plugin/arrow/ArrowColumnHandle.java
Outdated
Show resolved
Hide resolved
return; | ||
} | ||
try { | ||
RootAllocator allocator = new RootAllocator(Long.MAX_VALUE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You would usually create a RootAllocator as a class member and they should be closed when not used anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BryanCutler I have made changes for closing allocator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not re-using the flight client. The root allocator will be closed when ArrowFlightClient
is closed or auto closed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is this RootAllocator closed then? The FlightClient creates a child allocator that is closed along with the client, it will not close the root
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You wouldn't normally create a RootAllocator for each client, it could be reused as each client will internally make a child allocator. It's ok to clean that up later, since you are closing it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would we determine when the root allocator should be closed? Should we keep track of open flight clients and close root allocator when there are no flight clients open for some duration of time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would create 1 RootAllocator for the connector and then close it during Connector.shutdown()
. The FlightClient creates it's own child allocator that is closed with the client, so the root won't be keeping any buffers directly.
presto-base-arrow-flight/src/main/java/com/facebook/plugin/arrow/ArrowFlightClientHandler.java
Outdated
Show resolved
Hide resolved
trustedCertificate.get().close(); | ||
} | ||
shutdownTimer(); | ||
isClientClosed.set(true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the calls to getClient()
and close()
need to be thread-safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BryanCutler I have addressed comments can please review changes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a69b92e
to
7bd4ec9
Compare
import static java.util.Objects.requireNonNull; | ||
import static java.util.stream.Collectors.joining; | ||
|
||
public class ArrowQueryBuilder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might better belong in a submodule that depends on this module which implements the Flight SQL spec. I don't think it belongs here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tdcmeehan I have addressed comments can please review changes
04cdc53
to
824175b
Compare
@tdcmeehan @BryanCutler @steveburnett |
6d9d75b
to
11bada5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the docs! Minor suggestions, mostly formatting.
A local doc build returns the following warning:
/Users/steveburnett/Documents/GitHub/presto/presto-docs/src/main/sphinx/connector/base-arrow-flight.rst: WARNING: document isn't included in any toctree
To address this warning,
-
Add
connector/base-arrow-flight
to
https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/connector.rst so the new page is present on the Connector page in the Presto documentation. -
Add the new line in alphabetic order for the new page title Arrow-flight Connector.
Suggest update of the release note to include the PR number in both entries, and to link to the new doc from the release note.
|
3b73ee3
to
c4e77f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for including the doc from #23212 here! A few minor suggestions, looks good.
a2ed31a
to
734fa2b
Compare
--------------------------------------------------------------------------------- | ||
To create a plugin extending the base-arrow-module, you need to implement certain abstract methods that are specific to your use case. Below are the required classes and their purposes: | ||
|
||
* ``AbstractArrowFlightClientHandler.java`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lithinwxd Can you update these instructions to match the latest code changes?
3cb87b6
to
ed3d407
Compare
--------------------------------------------------------------------------------- | ||
To create a plugin extending the presto-base-arrow-flight module, you need to implement certain abstract methods that are specific to your use case. Below are the required classes and their purposes: | ||
|
||
- ``BaseArrowFlightClient.java`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@steveburnett We have updated the documentation to match the latest code base. Can you review?
f763a93
to
3dcded8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull updated branch, new local doc build, looks good. Thanks!
} | ||
else { | ||
String value = new String(vector.get(i), StandardCharsets.UTF_8); | ||
type.writeSlice(builder, Slices.utf8Slice(value)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you create a Slice
directly from the raw UTF-8 bytes from vector.get(i)
using Slice wrappedBuffer(byte... array)
? That would be more efficient that copying to a String first.
elementBuilder.appendNull(); | ||
} | ||
else { | ||
appendValueToBuilder(elementType, elementBuilder, value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just call buildBlockFromValueVector
with the element vector instead of making a new function to append Objects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
buildBlockFromValueVector
gives a block, but here we want to append multiple values to the same block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a lot of additional code to deal with writing elements and it's poor performance to call reader.readObject()
for each element of the list.
I think what you could do is make BlockBuilder an argument instead of creating it in each function, then build()
after calling the function
void buildBlockFromValueVector(ValueVector vector, Type type, BlockBuilder builder)
then in buildBlockFromListVector
you would call this function with the element vector and elementBuilder
, for example
buildBlockFromValueVector(vector.getDataVector(), elementType, elementBuilder);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vector.getDataVector()
gives all the values from the list, not just the element vector. Are you aware of a way to get the element vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can look here for an example of accessing elements within a ListVector https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java#L525
|
||
import static java.util.Objects.requireNonNull; | ||
|
||
public class ClientClosingFlightStream |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this class is necessary. You shouldn't lump a client and stream together, just keep them separate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tdcmeehan is it okay for you to decouple client and stream ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we need to close the client and stream together, it's convenient for encapsulation to put this in one type (otherwise the user of the BaseFlightClient
needs to supply a client, which needs to be closed separately). What about a more generic AutoCloseableSupplier
or something along those lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made this comment because the Flight Server dictates how many streams a requested flight will have, and it could be multiple. The client would normally process all streams, closing each one independently. I guess if we are assuming there will always be 1 stream only for the given client, this is ok. But if that ever changes or the client needs to perform some additional actions, then this wouldn't be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This remains true. Because streams are opened on a per-split basis, the concurrency is being managed by the Presto evaluation engine, and we can safely make the presumption that there will be 1 client per split, 1 stream per client per split.
dcf2b14
to
034c5d7
Compare
Co-authored-by: sai bhaskar reddy <[email protected]> Co-authored-by: SthuthiGhosh9400 <[email protected]> Co-authored-by: lithinwxd <[email protected]> Co-authored-by: Steve Burnett <[email protected]> Co-authored-by: elbinpallimalilibm <[email protected]> Co-authored-by: Tim Meehan <[email protected]>
034c5d7
to
2bcce6b
Compare
Hi @sabbasani @elbinpallimalilibm based on the discussion we had with Shweta and Sandhya, we are gonna take over this work, @BryanCutler will help move this PR over the finishing line. We keep everyone who worked on this PR as co-authors. Thank you. cc @tdcmeehan |
@JsonProperty("schema") String schema, | ||
@JsonProperty("table") String table) | ||
{ | ||
this.schema = schema; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BryanCutler We are testing table valued functions with an extension of this connector and there is a use case where the PTVF won't have a table and schema associated with it. So we need to consider such a use case when designing the table handle for this connector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain this use case more? I don't quite understand how a TableHandle would not have a table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DM'd you the details.
@JsonProperty("schema") String schema, | ||
@JsonProperty("table") String table) | ||
{ | ||
this.schema = schema; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Optional and provide appropriate constructor argument validation for schema and table.
Eg:
private final Optional <String> schema;
and
this. schema = requireNonNull(schema, "schema is null");
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To reiterate, the ArrowConnector implementation controls the value of schema and table here, so it could set them to a unique identifier even if they are not necessarily needed as you say. I'm not sure it's correct to make these properties optional, and I don't see that done in any other connector. @tdcmeehan can correct me if I'm mistaken.
Continuing this PR at #24427 |
<dependency> | ||
<groupId>io.perfmark</groupId> | ||
<artifactId>perfmark-api</artifactId> | ||
<version>${perfmark-api.version}</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@elbinpallimalilibm the managed dependencies for perfmark-api and error_prone_annotations don't seem to be needed anymore, was there a specific reason they were added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We were seeing upper bound errors on build without mentioning the dependencies in the POM. Is that error not happening now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not seeing any errors now, looks like error prone version has been upgraded in root pom
#24427 has been merged, will close this one |
Description
Motivation and Context
Impact
Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.
If release note is NOT required, use: