-
- Downloads
[SPARK-23523][SQL] Fix the incorrect result caused by the rule OptimizeMetadataOnlyQuery
## What changes were proposed in this pull request? ```Scala val tablePath = new File(s"${path.getCanonicalPath}/cOl3=c/cOl1=a/cOl5=e") Seq(("a", "b", "c", "d", "e")).toDF("cOl1", "cOl2", "cOl3", "cOl4", "cOl5") .write.json(tablePath.getCanonicalPath) val df = spark.read.json(path.getCanonicalPath).select("CoL1", "CoL5", "CoL3").distinct() df.show() ``` It generates a wrong result. ``` [c,e,a] ``` We have a bug in the rule `OptimizeMetadataOnlyQuery `. We should respect the attribute order in the original leaf node. This PR is to fix it. ## How was this patch tested? Added a test case Author: gatorsmile <gatorsmile@gmail.com> Closes #20684 from gatorsmile/optimizeMetadataOnly.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LocalRelation.scala 5 additions, 4 deletions...ache/spark/sql/catalyst/plans/logical/LocalRelation.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala 10 additions, 2 deletions...pache/spark/sql/execution/OptimizeMetadataOnlyQuery.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFsRelation.scala 3 additions, 0 deletions...he/spark/sql/execution/datasources/HadoopFsRelation.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/OptimizeMetadataOnlyQuerySuite.scala 22 additions, 0 deletions.../spark/sql/execution/OptimizeMetadataOnlyQuerySuite.scala
Please register or sign in to comment