-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Go] Allow adding existing arrays into structs #35
Comments
Could you just use the existing interface https://pkg.go.dev/github.com/apache/arrow/go/[email protected]/arrow/array#NewStructArray to create your struct array? newStruct, err := array.NewStructArray(internalCols, []string{"a", "b", "c", "d"....}) |
Actually no, because what you suggest works on standard Go arrays. I have Arrow array (for instance |
the function I'm referring to takes a slice of var cols []arrow.Array
sb := array.NewStringBuilder(memory.DefaultAllocator)
defer sb.Release()
sb.AppendValues([]string{"a", "b", "c"}, nil)
cols = append(cols, sb.NewStringArray())
sb.AppendValues([]string{"d", "e", "f"}, nil)
cols = append(cols, sb.NewStringArray())
newStruct, err := array.NewStructArray(cols, []string{"col1", "col2"})
defer newStruct.Release() and |
@zeroshade Sorry, I misunderstood your example! This is great. Are there similar functions for creating lists and maps? You probably know the topic in snowflake driver :) We have use case like this:
API you presented is great for structs. Is there anything similar for lists and maps? So for each field in struct:
And I can apply it to all arrays in struct and build a struct from it using
And build new list/map? |
There isn't currently a fully convenient one like we have for structs currently, but you can easily do so by using the newValues := convertFunction(listArr.ListValues())
defer newValues.Release()
newListData := array.NewData(listArr.DataType(), listArr.Len(), listArr.Data().Buffers(), []arrow.ArrayData{newValues.Data()}, listArr.NullN(), /* offset=*/0)
defer newListData.Release()
newListArr := array.NewListData(newListData) Similarly you can construct map arrays, which are physically laid out as equivalent to a That said, I think it would be a reasonable ask to add new convenience functions for this to make it easier for consumers to construct new list arrays and map arrays without having to manipulate the Data object themselves directly. |
@zeroshade it works! But why is this so complicated, I would have never figured it out myself :( I only had to make tiny change - while creating Let me check if I understand how to do it for maps. |
That worked:
Thanks a lot! It would be nice to have some helper functions though :) |
Because of the need for the offsets buffer and so on, it's tough to think of what the interface for such helper functions for Lists and Maps would be. What do you think would make sense for helper constructors for those? |
@zeroshade I think there's one vital thing to actually address here: offset should also be used (as the passed in record/array may be sliced), but not for struct arrays (they are special & I don't know why). |
@candiduslynx I'm very confused by that file, it looks like you're using a builder to make a deep copy of the arrays which doesn't make any sense to me. You have this function: func slice(r arrow.Record) []arrow.Record {
res := make([]arrow.Record, r.NumRows())
for i := int64(0); i < r.NumRows(); i++ {
res[i] = r.NewSlice(i, i+1)
}
return res
} If I'm reading this right, you're slicing the record int a slice of records of exactly 1 row each? Why? But I'm more confused by this
What do you mean "special"? The offset handling for struct arrays should work precisely the same as any other type. Can you elaborate on what the issue there is? |
Tests requirement, it'll be removed soon (we need to sort rows in tests, so we slice to single-row records).
The schema passed to the I'll revisit the code in
cloudquery/filetypes#279 |
I've been wanting to add sort functions to the compute package (which already has Take & Filter).... if you want to try your hand at contributing sort functions, I'd be happy to review them and it would solve your issue of making single-row slices :)
I'd be curious what a minimal reproducer of this might look like. In theory for a sliced array, only the top-level needs the offset. So if your child arrays are already sliced, then it makes sense you don't need to have an offset in the top struct as offsets are only related to the Array they are attached to. So i can think of three possible situations here:
|
func (a *Struct) setData(data *Data) {
a.array.setData(data)
a.fields = make([]arrow.Array, len(data.childData))
for i, child := range data.childData {
if data.offset != 0 || child.Len() != data.length {
sub := NewSliceData(child, int64(data.offset), int64(data.offset+data.length))
a.fields[i] = MakeFromData(sub)
sub.Release()
} else {
a.fields[i] = MakeFromData(child)
}
}
} I think is the culprit |
I'll take a look later this week and see what i can figure out |
cc: @yevgenypats @erezrokah (seems like the repo moved) |
Describe the enhancement requested
Hi! We have a use case, in which we have existing Arrow arrays and we want to compose them to struct. Unfortunately, currently
StructBuilder
does not support it. So currently we have to copy all values from the specific Arrow array to the similar array in struct one by one. It would be great if we can just reuse existing array without manual copying.What we do now?
Something like this:
What we want:
The main problem here is not this few lines of code that we need to write, but manual memory copying. Is it doable?
Component(s)
Go
The text was updated successfully, but these errors were encountered: