-
- Downloads
[SPARK-6627] Some clean-up in shuffle code.
Before diving into review #4450 I did a look through the existing shuffle code to learn how it works. Unfortunately, there are some very confusing things in this code. This patch makes a few small changes to simplify things. It is not easily to concisely describe the changes because of how convoluted the issues were, but they are fairly small logically: 1. There is a trait named `ShuffleBlockManager` that only deals with one logical function which is retrieving shuffle block data given shuffle block coordinates. This trait has two implementors FileShuffleBlockManager and IndexShuffleBlockManager. Confusingly the vast majority of those implementations have nothing to do with this particular functionality. So I've renamed the trait to ShuffleBlockResolver and documented it. 2. The aforementioned trait had two almost identical methods, for no good reason. I removed one method (getBytes) and modified callers to use the other one. I think the behavior is preserved in all cases. 3. The sort shuffle code uses an identifier "0" in the reduce slot of a BlockID as a placeholder. I made it into a constant since it needs to be consistent across multiple places. I think for (3) there is actually a better solution that would avoid the need to do this type of workaround/hack in the first place, but it's more complex so I'm punting it for now. Author: Patrick Wendell <patrick@databricks.com> Closes #5286 from pwendell/cleanup and squashes the following commits: c71fbc7 [Patrick Wendell] Open interface back up for testing f36edd5 [Patrick Wendell] Code review feedback d1c0494 [Patrick Wendell] Style fix a406079 [Patrick Wendell] [HOTFIX] Some clean-up in shuffle code.
Showing
- core/src/main/scala/org/apache/spark/shuffle/FileShuffleBlockManager.scala 1 addition, 6 deletions...la/org/apache/spark/shuffle/FileShuffleBlockManager.scala
- core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockManager.scala 13 additions, 14 deletions...a/org/apache/spark/shuffle/IndexShuffleBlockManager.scala
- core/src/main/scala/org/apache/spark/shuffle/ShuffleBlockResolver.scala 9 additions, 5 deletions...scala/org/apache/spark/shuffle/ShuffleBlockResolver.scala
- core/src/main/scala/org/apache/spark/shuffle/ShuffleManager.scala 4 additions, 1 deletion.../main/scala/org/apache/spark/shuffle/ShuffleManager.scala
- core/src/main/scala/org/apache/spark/shuffle/ShuffleWriter.scala 1 addition, 1 deletion...c/main/scala/org/apache/spark/shuffle/ShuffleWriter.scala
- core/src/main/scala/org/apache/spark/shuffle/hash/HashShuffleManager.scala 4 additions, 4 deletions...la/org/apache/spark/shuffle/hash/HashShuffleManager.scala
- core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala 5 additions, 4 deletions...la/org/apache/spark/shuffle/sort/SortShuffleManager.scala
- core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala 3 additions, 3 deletions...ala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala
- core/src/main/scala/org/apache/spark/storage/BlockManager.scala 5 additions, 9 deletions...rc/main/scala/org/apache/spark/storage/BlockManager.scala
- core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala 4 additions, 2 deletions...ala/org/apache/spark/util/collection/ExternalSorter.scala
- core/src/test/scala/org/apache/spark/shuffle/hash/HashShuffleManagerSuite.scala 1 addition, 1 deletion...g/apache/spark/shuffle/hash/HashShuffleManagerSuite.scala
- tools/src/main/scala/org/apache/spark/tools/StoragePerfTester.scala 1 addition, 1 deletion...main/scala/org/apache/spark/tools/StoragePerfTester.scala
Loading
Please register or sign in to comment