-
- Downloads
[SPARK-19471][SQL] AggregationIterator does not initialize the generated...
[SPARK-19471][SQL] AggregationIterator does not initialize the generated result projection before using it ## What changes were proposed in this pull request? Recently, we have also encountered such NPE issues in our production environment as described in: https://issues.apache.org/jira/browse/SPARK-19471 This issue can be reproduced by the following examples: ` val df = spark.createDataFrame(Seq(("1", 1), ("1", 2), ("2", 3), ("2", 4))).toDF("x", "y") //HashAggregate, SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key=false df.groupBy("x").agg(rand(),sum("y")).show() //ObjectHashAggregate, SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key=false df.groupBy("x").agg(rand(),collect_list("y")).show() //SortAggregate, SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key=false &&SQLConf.USE_OBJECT_HASH_AGG.key=false df.groupBy("x").agg(rand(),collect_list("y")).show()` ` This PR is based on PR-16820(https://github.com/apache/spark/pull/16820) with test cases for all aggregation paths. We want to push it forward. > When AggregationIterator generates result projection, it does not call the initialize method of the Projection class. This will cause a runtime NullPointerException when the projection involves nondeterministic expressions. ## How was this patch tested? unit test verified in production environment Author: donnyzone <wellfengzhu@gmail.com> Closes #18920 from DonnyZone/Branch-spark-19471.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggregationIterator.scala 4 additions, 0 deletions...e/spark/sql/execution/aggregate/AggregationIterator.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala 2 additions, 1 deletion...che/spark/sql/execution/aggregate/HashAggregateExec.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectAggregationIterator.scala 2 additions, 0 deletions...k/sql/execution/aggregate/ObjectAggregationIterator.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectHashAggregateExec.scala 2 additions, 1 deletion...ark/sql/execution/aggregate/ObjectHashAggregateExec.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/SortAggregateExec.scala 2 additions, 1 deletion...che/spark/sql/execution/aggregate/SortAggregateExec.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/SortBasedAggregationIterator.scala 2 additions, 0 deletions...ql/execution/aggregate/SortBasedAggregationIterator.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/TungstenAggregationIterator.scala 4 additions, 0 deletions...sql/execution/aggregate/TungstenAggregationIterator.scala
- sql/core/src/test/scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala 45 additions, 0 deletions.../scala/org/apache/spark/sql/DataFrameFunctionsSuite.scala
Loading
Please register or sign in to comment