-
- Downloads
[SPARK-10990] [SPARK-11018] [SQL] improve unrolling of complex types
This PR improve the unrolling and read of complex types in columnar cache: 1) Using UnsafeProjection to do serialization of complex types, so they will not be serialized three times (two for actualSize) 2) Copy the bytes from UnsafeRow/UnsafeArrayData to ByteBuffer directly, avoiding the immediate byte[] 3) Using the underlying array in ByteBuffer to create UTF8String/UnsafeRow/UnsafeArrayData without copy. Combine these optimizations, we can reduce the unrolling time from 25s to 21s (20% less), reduce the scanning time from 3.5s to 2.5s (28% less). ``` df = sqlContext.read.parquet(path) t = time.time() df.cache() df.count() print 'unrolling', time.time() - t for i in range(10): t = time.time() print df.select("*")._jdf.queryExecution().toRdd().count() print time.time() - t ``` The schema is ``` root |-- a: struct (nullable = true) | |-- b: long (nullable = true) | |-- c: string (nullable = true) |-- d: array (nullable = true) | |-- element: long (containsNull = true) |-- e: map (nullable = true) | |-- key: long | |-- value: string (valueContainsNull = true) ``` Now the columnar cache depends on that UnsafeProjection support all the data types (including UDT), this PR also fix that. Author: Davies Liu <davies@databricks.com> Closes #9016 from davies/complex2.
Showing
- sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java 12 additions, 0 deletions...pache/spark/sql/catalyst/expressions/UnsafeArrayData.java
- sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java 12 additions, 0 deletions.../org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala 5 additions, 0 deletions...park/sql/catalyst/expressions/codegen/CodeGenerator.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateSafeProjection.scala 1 addition, 0 deletions...catalyst/expressions/codegen/GenerateSafeProjection.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala 20 additions, 9 deletions...talyst/expressions/codegen/GenerateUnsafeProjection.scala
- sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnAccessor.scala 4 additions, 5 deletions.../scala/org/apache/spark/sql/columnar/ColumnAccessor.scala
- sql/core/src/main/scala/org/apache/spark/sql/columnar/ColumnType.scala 94 additions, 93 deletions...main/scala/org/apache/spark/sql/columnar/ColumnType.scala
- sql/core/src/main/scala/org/apache/spark/sql/columnar/InMemoryColumnarTableScan.scala 4 additions, 2 deletions...apache/spark/sql/columnar/InMemoryColumnarTableScan.scala
- sql/core/src/test/scala/org/apache/spark/sql/columnar/ColumnTypeSuite.scala 15 additions, 22 deletions...scala/org/apache/spark/sql/columnar/ColumnTypeSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/columnar/NullableColumnAccessorSuite.scala 4 additions, 3 deletions...ache/spark/sql/columnar/NullableColumnAccessorSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/columnar/NullableColumnBuilderSuite.scala 7 additions, 6 deletions...pache/spark/sql/columnar/NullableColumnBuilderSuite.scala
- unsafe/src/main/java/org/apache/spark/unsafe/types/UTF8String.java 10 additions, 0 deletions...c/main/java/org/apache/spark/unsafe/types/UTF8String.java
Loading
Please register or sign in to comment