-
- Downloads
[SPARK-4550] In sort-based shuffle, store map outputs in serialized form
Refer to the JIRA for the design doc and some perf results. I wanted to call out some of the more possibly controversial changes up front: * Map outputs are only stored in serialized form when Kryo is in use. I'm still unsure whether Java-serialized objects can be relocated. At the very least, Java serialization writes out a stream header which causes problems with the current approach, so I decided to leave investigating this to future work. * The shuffle now explicitly operates on key-value pairs instead of any object. Data is written to shuffle files in alternating keys and values instead of key-value tuples. `BlockObjectWriter.write` now accepts a key argument and a value argument instead of any object. * The map output buffer can hold a max of Integer.MAX_VALUE bytes. Though this wouldn't be terribly difficult to change. * When spilling occurs, the objects that still in memory at merge time end up serialized and deserialized an extra time. Author: Sandy Ryza <sandy@cloudera.com> Closes #4450 from sryza/sandy-spark-4550 and squashes the following commits: 8c70dd9 [Sandy Ryza] Fix serialization 9c16fe6 [Sandy Ryza] Fix a couple tests and move getAutoReset to KryoSerializerInstance 6c54e06 [Sandy Ryza] Fix scalastyle d8462d8 [Sandy Ryza] SPARK-4550
Showing
- core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala 10 additions, 0 deletions...in/scala/org/apache/spark/serializer/KryoSerializer.scala
- core/src/main/scala/org/apache/spark/serializer/Serializer.scala 31 additions, 0 deletions...c/main/scala/org/apache/spark/serializer/Serializer.scala
- core/src/main/scala/org/apache/spark/shuffle/hash/HashShuffleWriter.scala 1 addition, 1 deletion...ala/org/apache/spark/shuffle/hash/HashShuffleWriter.scala
- core/src/main/scala/org/apache/spark/storage/BlockObjectWriter.scala 31 additions, 6 deletions...in/scala/org/apache/spark/storage/BlockObjectWriter.scala
- core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala 2 additions, 4 deletions...rg/apache/spark/storage/ShuffleBlockFetcherIterator.scala
- core/src/main/scala/org/apache/spark/util/collection/ChainedBuffer.scala 144 additions, 0 deletions...cala/org/apache/spark/util/collection/ChainedBuffer.scala
- core/src/main/scala/org/apache/spark/util/collection/ExternalAppendOnlyMap.scala 4 additions, 2 deletions.../apache/spark/util/collection/ExternalAppendOnlyMap.scala
- core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala 75 additions, 69 deletions...ala/org/apache/spark/util/collection/ExternalSorter.scala
- core/src/main/scala/org/apache/spark/util/collection/PairIterator.scala 3 additions, 13 deletions...scala/org/apache/spark/util/collection/PairIterator.scala
- core/src/main/scala/org/apache/spark/util/collection/PartitionedAppendOnlyMap.scala 44 additions, 0 deletions...ache/spark/util/collection/PartitionedAppendOnlyMap.scala
- core/src/main/scala/org/apache/spark/util/collection/PartitionedPairBuffer.scala 32 additions, 26 deletions.../apache/spark/util/collection/PartitionedPairBuffer.scala
- core/src/main/scala/org/apache/spark/util/collection/PartitionedSerializedPairBuffer.scala 254 additions, 0 deletions...ark/util/collection/PartitionedSerializedPairBuffer.scala
- core/src/main/scala/org/apache/spark/util/collection/SizeTrackingAppendOnlyMap.scala 1 addition, 1 deletion...che/spark/util/collection/SizeTrackingAppendOnlyMap.scala
- core/src/main/scala/org/apache/spark/util/collection/WritablePartitionedPairCollection.scala 113 additions, 0 deletions...k/util/collection/WritablePartitionedPairCollection.scala
- core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala 15 additions, 0 deletions...ala/org/apache/spark/serializer/KryoSerializerSuite.scala
- core/src/test/scala/org/apache/spark/serializer/TestSerializer.scala 2 additions, 2 deletions...st/scala/org/apache/spark/serializer/TestSerializer.scala
- core/src/test/scala/org/apache/spark/shuffle/hash/HashShuffleManagerSuite.scala 6 additions, 6 deletions...g/apache/spark/shuffle/hash/HashShuffleManagerSuite.scala
- core/src/test/scala/org/apache/spark/storage/BlockObjectWriterSuite.scala 4 additions, 4 deletions...ala/org/apache/spark/storage/BlockObjectWriterSuite.scala
- core/src/test/scala/org/apache/spark/util/collection/ChainedBufferSuite.scala 143 additions, 0 deletions...org/apache/spark/util/collection/ChainedBufferSuite.scala
- core/src/test/scala/org/apache/spark/util/collection/ExternalSorterSuite.scala 144 additions, 45 deletions...rg/apache/spark/util/collection/ExternalSorterSuite.scala
Loading
Please register or sign in to comment