-
- Downloads
[SPARK-3649] Remove GraphX custom serializers
As [reported][1] on the mailing list, GraphX throws ``` java.lang.ClassCastException: java.lang.Long cannot be cast to scala.Tuple2 at org.apache.spark.graphx.impl.RoutingTableMessageSerializer$$anon$1$$anon$2.writeObject(Serializers.scala:39) at org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:195) at org.apache.spark.util.collection.ExternalSorter.spillToMergeableFile(ExternalSorter.scala:329) ``` when sort-based shuffle attempts to spill to disk. This is because GraphX defines custom serializers for shuffling pair RDDs that assume Spark will always serialize the entire pair object rather than breaking it up into its components. However, the spill code path in sort-based shuffle [violates this assumption][2]. GraphX uses the custom serializers to compress vertex ID keys using variable-length integer encoding. However, since the serializer can no longer rely on the key and value being serialized and deserialized together, performing such encoding would either require writing a tag byte (costly) or maintaining state in the serializer and assuming that serialization calls will alternate between key and value (fragile). Instead, this PR simply removes the custom serializers. This causes a **10% slowdown** (494 s to 543 s) and **16% increase in per-iteration communication** (2176 MB to 2518 MB) for PageRank (averages across 3 trials, 10 iterations per trial, uk-2007-05 graph, 16 r3.2xlarge nodes). [1]: http://apache-spark-user-list.1001560.n3.nabble.com/java-lang-ClassCastException-java-lang-Long-cannot-be-cast-to-scala-Tuple2-td13926.html#a14501 [2]: https://github.com/apache/spark/blob/f9d6220c792b779be385f3022d146911a22c2130/core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala#L329 Author: Ankur Dave <ankurdave@gmail.com> Closes #2503 from ankurdave/SPARK-3649 and squashes the following commits: a49c2ad [Ankur Dave] [SPARK-3649] Remove GraphX custom serializers
Showing
- graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala 6 additions, 8 deletions...hx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala
- graphx/src/main/scala/org/apache/spark/graphx/impl/MessageToPartition.scala 0 additions, 50 deletions...ala/org/apache/spark/graphx/impl/MessageToPartition.scala
- graphx/src/main/scala/org/apache/spark/graphx/impl/RoutingTablePartition.scala 0 additions, 18 deletions.../org/apache/spark/graphx/impl/RoutingTablePartition.scala
- graphx/src/main/scala/org/apache/spark/graphx/impl/Serializers.scala 0 additions, 369 deletions...main/scala/org/apache/spark/graphx/impl/Serializers.scala
- graphx/src/test/scala/org/apache/spark/graphx/SerializerSuite.scala 0 additions, 122 deletions.../test/scala/org/apache/spark/graphx/SerializerSuite.scala
Loading
Please register or sign in to comment