Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming arrow query results #317

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

fanyang01
Copy link
Contributor

This PR updates Arrow.QueryContext to return a streaming Arrow record reader, rather than materializing all query results in memory. This approach is beneficial for handling query results larger than available memory, such as when exporting a large table.


func (r *arrowStreamReader) Retain() {}

func (r *arrowStreamReader) Release() {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite know what this function is supposed to be and where it's called, I have a feeling it's the release method of the ArrowArrayStream, in which case it's missing one part of the spec

The release callback MUST mark the structure as released, by setting its release member to NULL.

This release callback doesn't set the release member to NULL, which is necessary to mark the structure as being released

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Tishj,

Thank you very much for the review! It seems that Go's Arrow binding wraps a Go RecordReader into an ArrowArrayStream and conforms to the spec with a release wrapper: https://github.com/apache/arrow-go/blob/v18.1.0/arrow/cdata/exports.go#L175

I’ve also updated the implementation to include an atomic reference count, similar to how the official RecordReader is implemented: https://github.com/apache/arrow-go/blob/v18.1.0/arrow/array/record.go#L44

arrow.go Outdated

func (r *arrowStreamReader) Release() {
if r.currentRec != nil {
r.currentRec.Release()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a release callback for the ArrowArray, it should check whether release is null before calling it

A released structure is indicated by setting its release callback to NULL. Before reading and interpreting a structure’s data, consumers SHOULD check for a NULL release callback and treat it accordingly (probably by erroring out).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this is not a callback but a method, so it cannot be null.

@taniabogatsch taniabogatsch added the feature / enhancement Code improvements or a new feature label Dec 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature / enhancement Code improvements or a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants