-
- Downloads
[SPARK-20678][SQL] Ndv for columns not in filter condition should also be updated
## What changes were proposed in this pull request? In filter estimation, we update column stats for those columns in filter condition. However, if the number of rows decreases after the filter (i.e. the overall selectivity is less than 1), we need to update (scale down) the number of distinct values (NDV) for all columns, no matter they are in filter conditions or not. This pr also fixes the inconsistency of rounding mode for ndv and rowCount. ## How was this patch tested? Added new tests. Author: wangzhenhua <wangzhenhua@huawei.com> Closes #17918 from wzhfy/scaleDownNdvAfterFilter.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala 12 additions, 0 deletions...alyst/plans/logical/statsEstimation/EstimationUtils.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/FilterEstimation.scala 78 additions, 56 deletions...lyst/plans/logical/statsEstimation/FilterEstimation.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/JoinEstimation.scala 5 additions, 20 deletions...talyst/plans/logical/statsEstimation/JoinEstimation.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/FilterEstimationSuite.scala 38 additions, 25 deletions.../sql/catalyst/statsEstimation/FilterEstimationSuite.scala
Please register or sign in to comment