-
- Downloads
[SPARK-3537][SPARK-3914][SQL] Refines in-memory columnar table statistics
This PR refines in-memory columnar table statistics: 1. adds 2 more statistics for in-memory table columns: `count` and `sizeInBytes` 1. adds filter pushdown support for `IS NULL` and `IS NOT NULL`. 1. caches and propagates statistics in `InMemoryRelation` once the underlying cached RDD is materialized. Statistics are collected to driver side with an accumulator. This PR also fixes SPARK-3914 by properly propagating in-memory statistics. Author: Cheng Lian <lian@databricks.com> Closes #2860 from liancheng/propagates-in-mem-stats and squashes the following commits: 0cc5271 [Cheng Lian] Restricts visibility of o.a.s.s.c.p.l.Statistics c5ff904 [Cheng Lian] Fixes test table name conflict a8c818d [Cheng Lian] Refines tests 1d01074 [Cheng Lian] Bug fix: shouldn't call STRING.actualSize on null string value 7dc6a34 [Cheng Lian] Adds more in-memory table statistics and propagates them properly
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeMap.scala 4 additions, 6 deletions.../apache/spark/sql/catalyst/expressions/AttributeMap.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala 15 additions, 16 deletions...apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
- sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnStats.scala 63 additions, 59 deletions...ain/scala/org/apache/spark/sql/columnar/ColumnStats.scala
- sql/core/src/main/scala/org/apache/spark/sql/columnar/InMemoryColumnarTableScan.scala 62 additions, 39 deletions...apache/spark/sql/columnar/InMemoryColumnarTableScan.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/ExistingRDD.scala 4 additions, 7 deletions...in/scala/org/apache/spark/sql/execution/ExistingRDD.scala
- sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetRelation.scala 1 addition, 2 deletions.../scala/org/apache/spark/sql/parquet/ParquetRelation.scala
- sql/core/src/test/scala/org/apache/spark/sql/CachedTableSuite.scala 10 additions, 1 deletion...rc/test/scala/org/apache/spark/sql/CachedTableSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/TestData.scala 8 additions, 8 deletionssql/core/src/test/scala/org/apache/spark/sql/TestData.scala
- sql/core/src/test/scala/org/apache/spark/sql/columnar/ColumnStatsSuite.scala 6 additions, 0 deletions...cala/org/apache/spark/sql/columnar/ColumnStatsSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/columnar/PartitionBatchPruningSuite.scala 47 additions, 29 deletions...pache/spark/sql/columnar/PartitionBatchPruningSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala 20 additions, 0 deletions...t/scala/org/apache/spark/sql/execution/PlannerSuite.scala
Loading
Please register or sign in to comment