-
- Downloads
[SPARK-17641][SQL] Collect_list/Collect_set should not collect null values.
## What changes were proposed in this pull request? We added native versions of `collect_set` and `collect_list` in Spark 2.0. These currently also (try to) collect null values, this is different from the original Hive implementation. This PR fixes this by adding a null check to the `Collect.update` method. ## How was this patch tested? Added a regression test to `DataFrameAggregateSuite`. Author: Herman van Hovell <hvanhovell@databricks.com> Closes #15208 from hvanhovell/SPARK-17641.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala 6 additions, 1 deletion...he/spark/sql/catalyst/expressions/aggregate/collect.scala
- sql/core/src/test/scala/org/apache/spark/sql/DataFrameAggregateSuite.scala 12 additions, 0 deletions.../scala/org/apache/spark/sql/DataFrameAggregateSuite.scala
Please register or sign in to comment