Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(csharp): improve handling of StructArrays #2587

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

davidhcoe
Copy link
Contributor

@davidhcoe davidhcoe changed the title feat(csharp): improve handling of structs feat(csharp): improve handling of StructArrays Mar 7, 2025

public ReadRowsStream(IAsyncEnumerator<ReadRowsResponse> response)
{
if (!response.MoveNextAsync().Result) { }
this.currentBuffer = response.Current.ArrowSchema.SerializedSchema.Memory;

if (response.Current != null)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A NullReferenceException occurs if there are no results from the query because response.Current is null. So, this uses an indicator of "HasRows" to dictate the behavior of it.

@@ -208,7 +226,7 @@ private IArrowType GetType(TableFieldSchema field, IArrowType type)
return type;
}

static IArrowReader ReadChunk(BigQueryReadClient readClient, string streamName)
static IArrowReader? ReadChunk(BigQueryReadClient readClient, string streamName)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes the internal contract slightly because ReadChunk can now result in a null if the stream doesn't have any rows.

@@ -112,7 +112,13 @@ public override QueryResult ExecuteQuery()
ReadSession rrs = readClient.CreateReadSession("projects/" + results.TableReference.ProjectId, rs, maxStreamCount);

long totalRows = results.TotalRows == null ? -1L : (long)results.TotalRows.Value;
IArrowArrayStream stream = new MultiArrowReader(TranslateSchema(results.Schema), rrs.Streams.Select(s => ReadChunk(readClient, s.Name)));

var readers = rrs.Streams
Copy link
Contributor Author

@davidhcoe davidhcoe Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make up for the internal contract change of allowing null on ReadChunk, only pass valid readers (that aren't null) to the MultiArrowReader. If it is empty, then no errors occur.

@davidhcoe davidhcoe marked this pull request as ready for review March 8, 2025 00:00
@github-actions github-actions bot added this to the ADBC Libraries 18 milestone Mar 8, 2025
Copy link
Contributor

@CurtHagenlocher CurtHagenlocher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the change! I've left a few comments and questions to consider, but nothing I'd think of as seriously blocking.

In hindsight, I think this is arguably the wrong approach to dealing with "nonstandard" values whether they're structured or decimal. It would have been better to keep all conversions in Arrow "vector" space instead of dealing with them one-by-one in a "get scalar" function. That way, if I'm a consumer who wants to deal with the results as an array but I don't want to have to handle values one at a time I can say "convert this struct array into a string array" and then it's just a regular Arrow string vector and I can keep going in vector space. For full generality, this might require a change to the C# Arrow implementation to support a common interface between C# arrays and Arrow arrays, but that's probably worth doing or at least thinking about.

(And we can obviously move in those directions over time.)

@@ -76,7 +83,9 @@ public static class IArrowArrayExtensions
case ArrowTypeId.Int64:
return ((Int64Array)arrowArray).GetValue(index);
case ArrowTypeId.String:
return ((StringArray)arrowArray).GetString(index);
StringArray sArray = (StringArray)arrowArray;
if (sArray.Length == 0) { return null; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we get here? Why is this not an error, and why does it impact only StringArray and not other arrays?

Assert.True(ctv.ExpectedValue.Equals(value), Utils.FormatMessage($"Expected value [{ctv.ExpectedValue}] does not match actual value [{value}] for {ctv.Name} for query [{query}]", environmentName));
bool areEqual = false;

if (value is ExpandoObject)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this missing a case for when the result is an array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. There is a check above if (type.BaseType?.Name.Contains("PrimitiveArray") == false) that puts us in a different path for comparing arrays.

return (array, index) => ((StringArray)array).GetString(index);
return (array, index) =>
{
StringArray? sArray = array as StringArray;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gives up some of the performance benefit of the approach. Instead of having to add a check to each invocation, can we have the caller tell us in advance what the expected source type is and return one of two different delegates?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldnt figure out an elegant way to have two separate delegates so I went with one and checking the DataType of the array that's passed in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

csharp: ValueAt extension causes error when StringArray length = 0
2 participants