-
- Downloads
[SPARK-5309][SQL] Add support for dictionaries in PrimitiveConverter for Strin...
...gs. Parquet Converters allow developers to take advantage of dictionary encoding of column data to reduce Column Binary decoding. The Spark PrimitiveConverter was not using that API and consequently for String columns that used dictionary compression repeated Binary to String conversions for the same String. In measurements this could account for over 25% of entire query time. For example a 500M row table split across 16 blocks was aggregated and summed in a litte under 30s before this change and a little under 20s after the change. Author: Michael Davies <Michael.BellDavies@gmail.com> Closes #4187 from MickDavies/SPARK-5309-2 and squashes the following commits: 327287e [Michael Davies] SPARK-5309: Add support for dictionaries in PrimitiveConverter for Strings. 33c002c [Michael Davies] SPARK-5309: Add support for dictionaries in PrimitiveConverter for Strings.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetConverter.scala 36 additions, 12 deletions...scala/org/apache/spark/sql/parquet/ParquetConverter.scala
- sql/core/src/test/scala/org/apache/spark/sql/parquet/ParquetQuerySuite.scala 11 additions, 0 deletions...cala/org/apache/spark/sql/parquet/ParquetQuerySuite.scala
Please register or sign in to comment