Skip to content
Snippets Groups Projects
Commit 2838bf8a authored by Cheng Lian's avatar Cheng Lian Committed by Michael Armbrust
Browse files

[SPARK-3537][SPARK-3914][SQL] Refines in-memory columnar table statistics

This PR refines in-memory columnar table statistics:

1. adds 2 more statistics for in-memory table columns: `count` and `sizeInBytes`
1. adds filter pushdown support for `IS NULL` and `IS NOT NULL`.
1. caches and propagates statistics in `InMemoryRelation` once the underlying cached RDD is materialized.

   Statistics are collected to driver side with an accumulator.

This PR also fixes SPARK-3914 by properly propagating in-memory statistics.

Author: Cheng Lian <lian@databricks.com>

Closes #2860 from liancheng/propagates-in-mem-stats and squashes the following commits:

0cc5271 [Cheng Lian] Restricts visibility of o.a.s.s.c.p.l.Statistics
c5ff904 [Cheng Lian] Fixes test table name conflict
a8c818d [Cheng Lian] Refines tests
1d01074 [Cheng Lian] Bug fix: shouldn't call STRING.actualSize on null string value
7dc6a34 [Cheng Lian] Adds more in-memory table statistics and propagates them properly
parent 879a1658
No related branches found
No related tags found
No related merge requests found
Showing
with 240 additions and 167 deletions
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment