-
- Downloads
[SPARK-11149] [SQL] Improve cache performance for primitive types
This PR improve the performance by: 1) Generate an Iterator that take Iterator[CachedBatch] as input, and call accessors (unroll the loop for columns), avoid the expensive Iterator.flatMap. 2) Use Unsafe.getInt/getLong/getFloat/getDouble instead of ByteBuffer.getInt/getLong/getFloat/getDouble, the later one actually read byte by byte. 3) Remove the unnecessary copy() in Coalesce(), which is not related to memory cache, found during benchmark. The following benchmark showed that we can speedup the columnar cache of int by 2x. ``` path = '/opt/tpcds/store_sales/' int_cols = ['ss_sold_date_sk', 'ss_sold_time_sk', 'ss_item_sk','ss_customer_sk'] df = sqlContext.read.parquet(path).select(int_cols).cache() df.count() t = time.time() print df.select("*")._jdf.queryExecution().toRdd().count() print time.time() - t ``` Author: Davies Liu <davies@databricks.com> Closes #9145 from davies/byte_buffer.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala 3 additions, 0 deletions...park/sql/catalyst/expressions/codegen/CodeFormatter.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala 4 additions, 6 deletions...park/sql/catalyst/expressions/codegen/CodeGenerator.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatterSuite.scala 28 additions, 38 deletions...sql/catalyst/expressions/codegen/CodeFormatterSuite.scala
- sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnType.scala 42 additions, 10 deletions...main/scala/org/apache/spark/sql/columnar/ColumnType.scala
- sql/core/src/main/scala/org/apache/spark/sql/columnar/GenerateColumnAccessor.scala 149 additions, 0 deletions...rg/apache/spark/sql/columnar/GenerateColumnAccessor.scala
- sql/core/src/main/scala/org/apache/spark/sql/columnar/InMemoryColumnarTableScan.scala 28 additions, 55 deletions...apache/spark/sql/columnar/InMemoryColumnarTableScan.scala
- sql/core/src/main/scala/org/apache/spark/sql/columnar/NullableColumnAccessor.scala 3 additions, 3 deletions...rg/apache/spark/sql/columnar/NullableColumnAccessor.scala
- sql/core/src/main/scala/org/apache/spark/sql/columnar/compression/compressionSchemes.scala 7 additions, 9 deletions...e/spark/sql/columnar/compression/compressionSchemes.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala 1 addition, 1 deletion...scala/org/apache/spark/sql/execution/basicOperators.scala
Loading
Please register or sign in to comment