-
- Downloads
[SPARK-15549][SQL] Disable bucketing when the output doesn't contain all bucketing columns
## What changes were proposed in this pull request? I create a bucketed table bucketed_table with bucket column i, ```scala case class Data(i: Int, j: Int, k: Int) sc.makeRDD(Array((1, 2, 3))).map(x => Data(x._1, x._2, x._3)).toDF.write.bucketBy(2, "i").saveAsTable("bucketed_table") ``` and I run the following SQLs: ```sql SELECT j FROM bucketed_table; Error in query: bucket column i not found in existing columns (j); SELECT j, MAX(k) FROM bucketed_table GROUP BY j; Error in query: bucket column i not found in existing columns (j, k); ``` I think we should add a check that, we only enable bucketing when it satisfies all conditions below: 1. the conf is enabled 2. the relation is bucketed 3. the output contains all bucketing columns ## How was this patch tested? Updated test cases to reflect the changes. Author: Yadong Qi <qiyadong2010@gmail.com> Closes #13321 from watermen/SPARK-15549.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala 6 additions, 7 deletions...in/scala/org/apache/spark/sql/execution/ExistingRDD.scala
- sql/hive/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala 11 additions, 0 deletions...cala/org/apache/spark/sql/sources/BucketedReadSuite.scala
Please register or sign in to comment