-
- Downloads
[SPARK-13897][SQL] RelationalGroupedDataset and KeyValueGroupedDataset
## What changes were proposed in this pull request? Previously, Dataset.groupBy returns a GroupedData, and Dataset.groupByKey returns a GroupedDataset. The naming is very similar, and unfortunately does not convey the real differences between the two. Assume we are grouping by some keys (K). groupByKey is a key-value style group by, in which the schema of the returned dataset is a tuple of just two fields: key and value. groupBy, on the other hand, is a relational style group by, in which the schema of the returned dataset is flattened and contain |K| + |V| fields. This pull request also removes the experimental tag from RelationalGroupedDataset. It has been with DataFrame since 1.3, and we have enough confidence now to stabilize it. ## How was this patch tested? This is a rename to improve API understandability. Should be covered by all existing tests. Author: Reynold Xin <rxin@databricks.com> Closes #11841 from rxin/SPARK-13897.
Showing
- project/MimaExcludes.scala 1 addition, 0 deletionsproject/MimaExcludes.scala
- sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala 31 additions, 25 deletionssql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
- sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala 17 additions, 18 deletions...n/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala
- sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala 17 additions, 20 deletions...scala/org/apache/spark/sql/RelationalGroupedDataset.scala
- sql/core/src/test/java/test/org/apache/spark/sql/JavaDatasetSuite.java 4 additions, 4 deletions...test/java/test/org/apache/spark/sql/JavaDatasetSuite.java
Loading
Please register or sign in to comment