-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix implication from SPARK-20346. #316
Conversation
Codecov Report
@@ Coverage Diff @@
## master #316 +/- ##
==========================================
+ Coverage 96.89% 96.89% +<.01%
==========================================
Files 60 60
Lines 1029 1032 +3
Branches 9 10 +1
==========================================
+ Hits 997 1000 +3
Misses 32 32
Continue to review full report at Codecov.
|
Hey @OlivierBlanvillain, when you are done with all the world cup celebrations, can you take a look at the solution proposed here? Thanks! :) |
(frameless ?= scala).&&(framelessAggr ?= scalaAggr) | ||
} | ||
|
||
check(forAll(prop[Long, Long] _)) | ||
// This fails due to issue #239 | ||
//check(forAll(prop[Option[Vector[Boolean]], Long] _)) | ||
check(forAll(prop[Option[Boolean], Long] _)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we keep Option[Vector[Boolean]]
, maybe as a separate test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that Vector tests take like 100x to run. I was able to reproduce the problem with a faster alternative. Now that I don't have to run the tests every minute I can probably set it back to be a vector. Let me change that.
(c, i) <- columns.toList[UntypedExpression[T]].zipWithIndex | ||
} yield new Column(c.expr).as(s"_${i+1}") | ||
|
||
// Workaround to SPARK-20346. The alternative is to allow the result to be Vector(null) for empty DataFrames. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One alternative is to allow the result to be Vector(null) for empty DataFrames. Another one would be to return an Option
).mkString(" or ") | ||
|
||
val selected = dataset.toDF().agg(cols.head, cols.tail:_*).as[Out](TypedExpressionEncoder[Out]) | ||
TypedDataset.create[Out](if (filterStr.isEmpty) selected else selected.filter(filterStr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only concern here is that this might be expensive. It might be good to benchmark that vs the default behavior to see how much we lose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result of agg would be a single row storing the aggregated results. So this filter only applies to a single row.
Replaced #314.