-
- Downloads
There was an error fetching the commit references. Please try again later.
[SPARK-12879] [SQL] improve the unsafe row writing framework
As we begin to use unsafe row writing framework(`BufferHolder` and `UnsafeRowWriter`) in more and more places(`UnsafeProjection`, `UnsafeRowParquetRecordReader`, `GenerateColumnAccessor`, etc.), we should add more doc to it and make it easier to use. This PR abstract the technique used in `UnsafeRowParquetRecordReader`: avoid unnecessary operatition as more as possible. For example, do not always point the row to the buffer at the end, we only need to update the size of row. If all fields are of primitive type, we can even save the row size updating. Then we can apply this technique to more places easily. a local benchmark shows `UnsafeProjection` is up to 1.7x faster after this PR: **old version** ``` Intel(R) Core(TM) i7-4960HQ CPU 2.60GHz unsafe projection: Avg Time(ms) Avg Rate(M/s) Relative Rate ------------------------------------------------------------------------------- single long 2616.04 102.61 1.00 X single nullable long 3032.54 88.52 0.86 X primitive types 9121.05 29.43 0.29 X nullable primitive types 12410.60 21.63 0.21 X ``` **new version** ``` Intel(R) Core(TM) i7-4960HQ CPU 2.60GHz unsafe projection: Avg Time(ms) Avg Rate(M/s) Relative Rate ------------------------------------------------------------------------------- single long 1533.34 175.07 1.00 X single nullable long 2306.73 116.37 0.66 X primitive types 8403.93 31.94 0.18 X nullable primitive types 12448.39 21.56 0.12 X ``` For single non-nullable long(the best case), we can have about 1.7x speed up. Even it's nullable, we can still have 1.3x speed up. For other cases, it's not such a boost as the saved operations only take a little proportion of the whole process. The benchmark code is included in this PR. Author: Wenchen Fan <wenchen@databricks.com> Closes #10809 from cloud-fan/unsafe-projection.
Showing
- sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/BufferHolder.java 25 additions, 19 deletions.../spark/sql/catalyst/expressions/codegen/BufferHolder.java
- sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java 38 additions, 20 deletions...ark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala 49 additions, 17 deletions...talyst/expressions/codegen/GenerateUnsafeProjection.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/UnsafeProjectionBenchmark.scala 136 additions, 0 deletions...cala/org/apache/spark/sql/UnsafeProjectionBenchmark.scala
- sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/UnsafeRowParquetRecordReader.java 3 additions, 14 deletions...ion/datasources/parquet/UnsafeRowParquetRecordReader.java
- sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/GenerateColumnAccessor.scala 4 additions, 4 deletions...spark/sql/execution/columnar/GenerateColumnAccessor.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/DefaultSource.scala 3 additions, 4 deletions.../spark/sql/execution/datasources/text/DefaultSource.scala
Loading
Please register or sign in to comment