-
- Downloads
[SPARK-21079][SQL] Calculate total size of a partition table as a sum of individual partitions
## What changes were proposed in this pull request? Storage URI of a partitioned table may or may not point to a directory under which individual partitions are stored. In fact, individual partitions may be located in totally unrelated directories. Before this change, ANALYZE TABLE table COMPUTE STATISTICS command calculated total size of a table by adding up sizes of files found under table's storage URI. This calculation could produce 0 if partitions are stored elsewhere. This change uses storage URIs of individual partitions to calculate the sizes of all partitions of a table and adds these up to produce the total size of a table. CC: wzhfy ## How was this patch tested? Added unit test. Ran ANALYZE TABLE xxx COMPUTE STATISTICS on a partitioned Hive table and verified that sizeInBytes is calculated correctly. Before this change, the size would be zero. Author: Masha Basmanova <mbasmanova@fb.com> Closes #18309 from mbasmanova/mbasmanova-analyze-part-table.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeTableCommand.scala 23 additions, 6 deletions...che/spark/sql/execution/command/AnalyzeTableCommand.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala 72 additions, 0 deletions...est/scala/org/apache/spark/sql/hive/StatisticsSuite.scala
Loading
Please register or sign in to comment