Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix implication from SPARK-20346. #316

Merged
merged 3 commits into from
Dec 21, 2018
Merged

Conversation

imarios
Copy link
Contributor

@imarios imarios commented Jul 15, 2018

Replaced #314.

@codecov-io
Copy link

codecov-io commented Jul 15, 2018

Codecov Report

Merging #316 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #316      +/-   ##
==========================================
+ Coverage   96.89%   96.89%   +<.01%     
==========================================
  Files          60       60              
  Lines        1029     1032       +3     
  Branches        9       10       +1     
==========================================
+ Hits          997     1000       +3     
  Misses         32       32
Impacted Files Coverage Δ
...ataset/src/main/scala/frameless/TypedDataset.scala 100% <100%> (ø) ⬆️
...ataset/src/main/scala/frameless/TypedEncoder.scala 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8e4ce70...571c071. Read the comment docs.

@imarios
Copy link
Contributor Author

imarios commented Jul 16, 2018

Hey @OlivierBlanvillain, when you are done with all the world cup celebrations, can you take a look at the solution proposed here? Thanks! :)

(frameless ?= scala).&&(framelessAggr ?= scalaAggr)
}

check(forAll(prop[Long, Long] _))
// This fails due to issue #239
//check(forAll(prop[Option[Vector[Boolean]], Long] _))
check(forAll(prop[Option[Boolean], Long] _))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep Option[Vector[Boolean]], maybe as a separate test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that Vector tests take like 100x to run. I was able to reproduce the problem with a faster alternative. Now that I don't have to run the tests every minute I can probably set it back to be a vector. Let me change that.

(c, i) <- columns.toList[UntypedExpression[T]].zipWithIndex
} yield new Column(c.expr).as(s"_${i+1}")

// Workaround to SPARK-20346. The alternative is to allow the result to be Vector(null) for empty DataFrames.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One alternative is to allow the result to be Vector(null) for empty DataFrames. Another one would be to return an Option

).mkString(" or ")

val selected = dataset.toDF().agg(cols.head, cols.tail:_*).as[Out](TypedExpressionEncoder[Out])
TypedDataset.create[Out](if (filterStr.isEmpty) selected else selected.filter(filterStr))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern here is that this might be expensive. It might be good to benchmark that vs the default behavior to see how much we lose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result of agg would be a single row storing the aggregated results. So this filter only applies to a single row.

@imarios imarios merged commit 09422e1 into typelevel:master Dec 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants