Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49230][Connect][SQL] Do not return UnboundRowEncoder when not needed #49339

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

xupefei
Copy link
Contributor

@xupefei xupefei commented Dec 31, 2024

What changes were proposed in this pull request?

This PR corrects the isRowEncoderSupported logic of encoderFor method. Now when the flag is off, encoderFor method will not return an UnboundRowEncoder but instead throws an exception. To get back to the old behaviour, used one of the following two method instead:

  • encoderFor(..., isRowEncoderSupported = true)
  • encoderForWithRowEncoderSupport(...)

To avoid breaking existing use cases, this PR changes some calls of encoderFor to encoderForWithRowEncoderSupport when the input type involves generic T, since T can be a Row.

Why are the changes needed?

Code cleanup.

Does this PR introduce any user-facing change?

No. This method is used internally.

How was this patch tested?

Existing tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the SQL label Dec 31, 2024
@xupefei xupefei changed the title [SPARK-49230] Do not return UnboundRowEncoder when not needed [SPARK-49230][Connect][SQL] Do not return UnboundRowEncoder when not needed Dec 31, 2024
@xupefei xupefei marked this pull request as ready for review January 3, 2025 15:45
@xupefei xupefei requested a review from hvanhovell January 3, 2025 15:45
@@ -956,7 +956,7 @@ class Dataset[T] private[sql] (
val generator = SparkUserDefinedFunction(
UDFAdaptors.iterableOnceToSeq(f),
UnboundRowEncoder :: Nil,
ScalaReflection.encoderFor[Seq[A]])
ScalaReflection.encoderForWithRowEncoderSupport[Seq[A]])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep in mind that for encoders that are used to convert data from the external representation to the internal representation we need an encoder with a valid schema. An UnboundRowEncoder does not have a valid/meaningful schema. It should not be used in these cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. I evaluated all call sites, it turns out none of them need to be with Row Encoder support. I only left a change in [[ExpressionEncoder]].

@github-actions github-actions bot removed the CONNECT label Jan 7, 2025
@xupefei xupefei requested a review from hvanhovell January 7, 2025 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants