-
- Downloads
[SPARK-20718][SQL] FileSourceScanExec with different filter orders should be...
[SPARK-20718][SQL] FileSourceScanExec with different filter orders should be the same after canonicalization ## What changes were proposed in this pull request? Since `constraints` in `QueryPlan` is a set, the order of filters can differ. Usually this is ok because of canonicalization. However, in `FileSourceScanExec`, its data filters and partition filters are sequences, and their orders are not canonicalized. So `def sameResult` returns different results for different orders of data/partition filters. This leads to, e.g. different decision for `ReuseExchange`, and thus results in unstable performance. ## How was this patch tested? Added a new test for `FileSourceScanExec.sameResult`. Author: wangzhenhua <wangzhenhua@huawei.com> Closes #17959 from wzhfy/canonicalizeFileSourceScanExec.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala 13 additions, 3 deletions...a/org/apache/spark/sql/execution/DataSourceScanExec.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/SameResultSuite.scala 49 additions, 0 deletions...cala/org/apache/spark/sql/execution/SameResultSuite.scala
Please register or sign in to comment