Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unsupported cast to string_view from utf8 in v18 #184

Open
fwojciec opened this issue Nov 8, 2024 · 2 comments
Open

unsupported cast to string_view from utf8 in v18 #184

fwojciec opened this issue Nov 8, 2024 · 2 comments
Labels
Type: enhancement New feature or request

Comments

@fwojciec
Copy link

fwojciec commented Nov 8, 2024

Describe the bug, including details regarding any error messages, version, and platform.

Casting between string <-> string_view types has been added in C++ and Python implementations of arrow in v18: apache/arrow#43302

I tested this in Python and it works with the latest (18.0.0) pyarrow release:

[ins] In [1]: import pyarrow as pa

[ins] In [2]: import pyarrow.compute as pc

[ins] In [3]: a = pa.array(["one", "two", "three"])

[ins] In [4]: a
Out[4]: 
<pyarrow.lib.StringArray object at 0x111e3b340>
[
  "one",
  "two",
  "three"
]

[ins] In [5]: a.cast(pa.string_view())
Out[5]: 
<pyarrow.lib.StringViewArray object at 0x111a3be80>
[
  "one",
  "two",
  "three"
]

It doesn't work with the v18 release of arrow-go:

package main

import (
	"context"
	"fmt"

	"github.com/apache/arrow-go/v18/arrow"
	"github.com/apache/arrow-go/v18/arrow/array"
	"github.com/apache/arrow-go/v18/arrow/compute"
	"github.com/apache/arrow-go/v18/arrow/memory"
)

func main() {
	alloc := memory.NewGoAllocator()
	b := array.NewStringBuilder(alloc)
	defer b.Release()
	b.AppendValues([]string{"a", "b", "c"}, nil)
	arr := b.NewArray()
	fmt.Println(arr)

	newArr, err := compute.CastArray(context.Background(), arr, &compute.CastOptions{
		ToType: arrow.BinaryTypes.StringView,
	})
	if err != nil {
		panic(err)
	}
	fmt.Println(newArr)
}

// Output:
// ["a" "b" "c"]
// panic: not implemented: unsupported cast to string_view from utf8
// ...
// exit status 2

I'd be happy to work on the PR to add it if someone can give me a pointer with description of what would it take - high-level - to add support for this in Go.

Component(s)

Release

@fwojciec fwojciec added the Type: bug Something isn't working label Nov 8, 2024
@assignUser assignUser added Type: enhancement New feature or request go Pull requests that update Go code and removed Type: bug Something isn't working go Pull requests that update Go code labels Nov 8, 2024
@zeroshade
Copy link
Member

Hey @fwojciec I'd love to get this cast in. Here's a quick overview of what would be necessary to do so, I'll happily review any PR you make:

  1. The top-level of the casts for strings can be found here: https://github.com/apache/arrow-go/blob/main/arrow/compute/cast.go#L519. The calls to addFn are for adding casts based on the source type and associating compute kernels for casting to them.
  2. The kernels set up here based on the output type, by calling addToBinaryKernels to add the appropriate casting kernels for String/LargeString/Binary/LargeBinary by using generics. 3. You'll need to create a new function for casting to StringView/BinaryView since they don't have "Large" variants like String and Binary (for example, notice that there are separate Cast*Fsb* functions for handling Fixed Size Binary types). You could then add it to the kernels appended by addToBinaryKernels in the same way as we do for fixed-size-binary.
  3. Finally, unit tests can be added here: https://github.com/apache/arrow-go/blob/main/arrow/compute/cast_test.go#L1489 when you are done.

If you run into any issues or difficulties, feel free to put up a draft PR and ping me and I'll help out!

Thanks again for looking into this!

@fwojciec
Copy link
Author

Thank you @zeroshade - that's super helpful! I'll try to have a PR in the next few days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants