-
- Downloads
[SPARK-10993] [SQL] Inital code generated encoder for product types
This PR is a first cut at code generating an encoder that takes a Scala `Product` type and converts it directly into the tungsten binary format. This is done through the addition of a new set of expression that can be used to invoke methods on raw JVM objects, extracting fields and converting the result into the required format. These can then be used directly in an `UnsafeProjection` allowing us to leverage the existing encoding logic. According to some simple benchmarks, this can significantly speed up conversion (~4x). However, replacing CatalystConverters is deferred to a later PR to keep this PR at a reasonable size. ```scala case class SomeInts(a: Int, b: Int, c: Int, d: Int, e: Int) val data = SomeInts(1, 2, 3, 4, 5) val encoder = ProductEncoder[SomeInts] val converter = CatalystTypeConverters.createToCatalystConverter(ScalaReflection.schemaFor[SomeInts].dataType) (1 to 5).foreach {iter => benchmark(s"converter $iter") { var i = 100000000 while (i > 0) { val res = converter(data).asInstanceOf[InternalRow] assert(res.getInt(0) == 1) assert(res.getInt(1) == 2) i -= 1 } } benchmark(s"encoder $iter") { var i = 100000000 while (i > 0) { val res = encoder.toRow(data) assert(res.getInt(0) == 1) assert(res.getInt(1) == 2) i -= 1 } } } ``` Results: ``` [info] converter 1: 7170ms [info] encoder 1: 1888ms [info] converter 2: 6763ms [info] encoder 2: 1824ms [info] converter 3: 6912ms [info] encoder 3: 1802ms [info] converter 4: 7131ms [info] encoder 4: 1798ms [info] converter 5: 7350ms [info] encoder 5: 1912ms ``` Author: Michael Armbrust <michael@databricks.com> Closes #9019 from marmbrus/productEncoder.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala 237 additions, 1 deletion...scala/org/apache/spark/sql/catalyst/ScalaReflection.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/Encoder.scala 44 additions, 0 deletions...cala/org/apache/spark/sql/catalyst/encoders/Encoder.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ProductEncoder.scala 67 additions, 0 deletions...g/apache/spark/sql/catalyst/encoders/ProductEncoder.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala 3 additions, 1 deletion...park/sql/catalyst/expressions/codegen/CodeGenerator.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects.scala 334 additions, 0 deletions...a/org/apache/spark/sql/catalyst/expressions/objects.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/types/GenericArrayData.scala 9 additions, 0 deletions...n/scala/org/apache/spark/sql/types/GenericArrayData.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/types/ObjectType.scala 42 additions, 0 deletions...rc/main/scala/org/apache/spark/sql/types/ObjectType.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/encoders/ProductEncoderSuite.scala 174 additions, 0 deletions...che/spark/sql/catalyst/encoders/ProductEncoderSuite.scala
Please register or sign in to comment