-
- Downloads
[SPARK-7311] Introduce internal Serializer API for determining if serializers...
[SPARK-7311] Introduce internal Serializer API for determining if serializers support object relocation This patch extends the `Serializer` interface with a new `Private` API which allows serializers to indicate whether they support relocation of serialized objects in serializer stream output. This relocatibilty property is described in more detail in `Serializer.scala`, but in a nutshell a serializer supports relocation if reordering the bytes of serialized objects in serialization stream output is equivalent to having re-ordered those elements prior to serializing them. The optimized shuffle path introduced in #4450 and #5868 both rely on serializers having this property; this patch just centralizes the logic for determining whether a serializer has this property. I also added tests and comments clarifying when this works for KryoSerializer. This change allows the optimizations in #4450 to be applied for shuffles that use `SqlSerializer2`. Author: Josh Rosen <joshrosen@databricks.com> Closes #5924 from JoshRosen/SPARK-7311 and squashes the following commits: 50a68ca [Josh Rosen] Address minor nits 0a7ebd7 [Josh Rosen] Clarify reason why SqlSerializer2 supports this serializer 123b992 [Josh Rosen] Cleanup for submitting as standalone patch. 4aa61b2 [Josh Rosen] Add missing newline 2c1233a [Josh Rosen] Small refactoring of SerializerPropertiesSuite to enable test re-use: 0ba75e6 [Josh Rosen] Add tests for serializer relocation property. 450fa21 [Josh Rosen] Back out accidental log4j.properties change 86d4dcd [Josh Rosen] Flag that SparkSqlSerializer2 supports relocation b9624ee [Josh Rosen] Expand serializer API and use new function to help control when new UnsafeShuffle path is used.
Showing
- core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala 7 additions, 0 deletions...in/scala/org/apache/spark/serializer/KryoSerializer.scala
- core/src/main/scala/org/apache/spark/serializer/Serializer.scala 34 additions, 1 deletion...c/main/scala/org/apache/spark/serializer/Serializer.scala
- core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala 1 addition, 2 deletions...ala/org/apache/spark/util/collection/ExternalSorter.scala
- core/src/test/scala/org/apache/spark/serializer/SerializerPropertiesSuite.scala 119 additions, 0 deletions...g/apache/spark/serializer/SerializerPropertiesSuite.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlSerializer2.scala 5 additions, 0 deletions.../org/apache/spark/sql/execution/SparkSqlSerializer2.scala
Loading
Please register or sign in to comment