-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-49230][Connect][SQL] Do not return UnboundRowEncoder when not needed #49339
base: master
Are you sure you want to change the base?
Conversation
sql/api/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
Show resolved
Hide resolved
@@ -956,7 +956,7 @@ class Dataset[T] private[sql] ( | |||
val generator = SparkUserDefinedFunction( | |||
UDFAdaptors.iterableOnceToSeq(f), | |||
UnboundRowEncoder :: Nil, | |||
ScalaReflection.encoderFor[Seq[A]]) | |||
ScalaReflection.encoderForWithRowEncoderSupport[Seq[A]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep in mind that for encoders that are used to convert data from the external representation to the internal representation we need an encoder with a valid schema. An UnboundRowEncoder does not have a valid/meaningful schema. It should not be used in these cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment. I evaluated all call sites, it turns out none of them need to be with Row Encoder support. I only left a change in [[ExpressionEncoder]].
What changes were proposed in this pull request?
This PR corrects the
isRowEncoderSupported
logic ofencoderFor
method. Now when the flag is off,encoderFor
method will not return anUnboundRowEncoder
but instead throws an exception. To get back to the old behaviour, used one of the following two method instead:encoderFor(..., isRowEncoderSupported = true)
encoderForWithRowEncoderSupport(...)
To avoid breaking existing use cases, this PR changes some calls of
encoderFor
toencoderForWithRowEncoderSupport
when the input type involves genericT
, sinceT
can be aRow
.Why are the changes needed?
Code cleanup.
Does this PR introduce any user-facing change?
No. This method is used internally.
How was this patch tested?
Existing tests.
Was this patch authored or co-authored using generative AI tooling?
No.