-
- Downloads
Do not re-use objects in the EdgePartition/EdgeTriplet iterators.
This avoids a silent data corruption issue (https://spark-project.atlassian.net/browse/SPARK-1188) and has no performance impact by my measurements. It also simplifies the code. As far as I can tell the object re-use was nothing but premature optimization. I did actual benchmarks for all the included changes, and there is no performance difference. I am not sure where to put the benchmarks. Does Spark not have a benchmark suite? This is an example benchmark I did: test("benchmark") { val builder = new EdgePartitionBuilder[Int] for (i <- (1 to 10000000)) { builder.add(i.toLong, i.toLong, i) } val p = builder.toEdgePartition p.map(_.attr + 1).iterator.toList } It ran for 10 seconds both before and after this change. Author: Daniel Darabos <darabos.daniel@gmail.com> Closes #276 from darabos/spark-1188 and squashes the following commits: 574302b [Daniel Darabos] Restore "manual" copying in EdgePartition.map(Iterator). Add comment to discourage novices like myself from trying to simplify the code. 4117a64 [Daniel Darabos] Revert EdgePartitionSuite. 4955697 [Daniel Darabos] Create a copy of the Edge objects in EdgeRDD.compute(). This avoids exposing the object re-use, while still enables the more efficient behavior for internal code. 4ec77f8 [Daniel Darabos] Add comments about object re-use to the affected functions. 2da5e87 [Daniel Darabos] Restore object re-use in EdgePartition. 0182f2b [Daniel Darabos] Do not re-use objects in the EdgePartition/EdgeTriplet iterators. This avoids a silent data corruption issue (SPARK-1188) and has no performance impact in my measurements. It also simplifies the code. c55f52f [Daniel Darabos] Tests that reproduce the problems from SPARK-1188.
Showing
- graphx/src/main/scala/org/apache/spark/graphx/EdgeRDD.scala 2 additions, 1 deletiongraphx/src/main/scala/org/apache/spark/graphx/EdgeRDD.scala
- graphx/src/main/scala/org/apache/spark/graphx/impl/EdgePartition.scala 12 additions, 3 deletions...in/scala/org/apache/spark/graphx/impl/EdgePartition.scala
- graphx/src/main/scala/org/apache/spark/graphx/impl/EdgeTripletIterator.scala 1 addition, 6 deletions...la/org/apache/spark/graphx/impl/EdgeTripletIterator.scala
- graphx/src/test/scala/org/apache/spark/graphx/impl/EdgeTripletIteratorSuite.scala 43 additions, 0 deletions...g/apache/spark/graphx/impl/EdgeTripletIteratorSuite.scala
Please register or sign in to comment