Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arrow ipc copy #2

Open
wants to merge 3 commits into
base: devin/1739874746-arrow-ipc-copy
Choose a base branch
from

Conversation

scgkiran
Copy link

@scgkiran scgkiran commented Mar 7, 2025

No description provided.

Copy link

@rymurr rymurr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally looks good imho.

I am just overall worried w/ compatibility between what is on disk and what others would store on disk

jobs:
unitTests:
name: SQL unit tests
runs-on: macos-latest
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why osx out of curiosity?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no reason, we were build linux and osx in other repo. I thought we will start with one. LMK if you want me to change this to linux.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would prefer linux. gives me more confidence on the platform we will run on

@@ -87,11 +117,11 @@ ArrowIPCWriteInitializeGlobal(ClientContext &context, FunctionData &bind_data,

vector<unique_ptr<Expression>> ArrowIPCWriteSelect(CopyToSelectInput &input) {
vector<unique_ptr<Expression>> result;
bool any_change = false;
// bool any_change = false;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we remove the commented code?


// Create stream decoder and buffer
auto stream_decoder = make_uniq<BufferingArrowIPCStreamDecoder>();
auto consume_result = stream_decoder->Consume(file_buffer.data(), file_size);
if (std::string(reinterpret_cast<char*>(file_buffer->data()), 6) == ARROW_MAGIC) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain what this is doing? Is the written data wrong and if so can we fix it on the write side?

Is the write side compatible w/ snowflake's arrow or not?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is trying to handle both file and stream formats. ipc stream doesn't have magic and footer, where as file format has it.
I will fix the write side to keep it same as snowflake.
Just thinking that the extension should support both formats, atleast on read side. I will do similar fix in arrow_scan_ipc.cpp.
I will add a comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants