-
- Downloads
[SPARK-15114][SQL] Column name generated by typed aggregate is super verbose
## What changes were proposed in this pull request? Generate a shorter default alias for `AggregateExpression `, In this PR, aggregate function name along with a index is used for generating the alias name. ```SQL val ds = Seq(1, 3, 2, 5).toDS() ds.select(typed.sum((i: Int) => i), typed.avg((i: Int) => i)).show() ``` Output before change. ```SQL +-----------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ |typedsumdouble(unresolveddeserializer(upcast(input[0, int], IntegerType, - root class: "scala.Int"), value#1), upcast(value))|typedaverage(unresolveddeserializer(upcast(input[0, int], IntegerType, - root class: "scala.Int"), value#1), newInstance(class scala.Tuple2))| +-----------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ | 11.0| 2.75| +-----------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------+ ``` Output after change: ```SQL +-----------------+---------------+ |typedsumdouble_c1|typedaverage_c2| +-----------------+---------------+ | 11.0| 2.75| +-----------------+---------------+ ``` Note: There is one test in ParquetSuites.scala which shows that that the system picked alias name is not usable and is rejected. [test](https://github.com/apache/spark/blob/master/sql/hive/src/test/scala/org/apache/spark/sql/hive/parquetSuites.scala#L672-#L687) ## How was this patch tested? A new test was added in DataSetAggregatorSuite. Author: Dilip Biswal <dbiswal@us.ibm.com> Closes #13045 from dilipbiswal/spark-15114.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala 4 additions, 2 deletions...ala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/unresolved.scala 5 additions, 2 deletions...a/org/apache/spark/sql/catalyst/analysis/unresolved.scala
- sql/core/src/main/scala/org/apache/spark/sql/Column.scala 15 additions, 1 deletionsql/core/src/main/scala/org/apache/spark/sql/Column.scala
- sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala 3 additions, 0 deletions...scala/org/apache/spark/sql/RelationalGroupedDataset.scala
- sql/core/src/test/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala 12 additions, 0 deletions...t/scala/org/apache/spark/sql/DatasetAggregatorSuite.scala
Loading
Please register or sign in to comment