-
- Downloads
[SPARK-21243][Core] Limit no. of map outputs in a shuffle fetch
## What changes were proposed in this pull request? For configurations with external shuffle enabled, we have observed that if a very large no. of blocks are being fetched from a remote host, it puts the NM under extra pressure and can crash it. This change introduces a configuration `spark.reducer.maxBlocksInFlightPerAddress` , to limit the no. of map outputs being fetched from a given remote address. The changes applied here are applicable for both the scenarios - when external shuffle is enabled as well as disabled. ## How was this patch tested? Ran the job with the default configuration which does not change the existing behavior and ran it with few configurations of lower values -10,20,50,100. The job ran fine and there is no change in the output. (I will update the metrics related to NM in some time.) Author: Dhruve Ashar <dhruveashar@gmail.com> Closes #18487 from dhruve/impr/SPARK-21243.
Showing
- core/src/main/scala/org/apache/spark/internal/config/package.scala 11 additions, 0 deletions...main/scala/org/apache/spark/internal/config/package.scala
- core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala 1 addition, 0 deletions...la/org/apache/spark/shuffle/BlockStoreShuffleReader.scala
- core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala 71 additions, 10 deletions...rg/apache/spark/storage/ShuffleBlockFetcherIterator.scala
- core/src/test/scala/org/apache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala 6 additions, 0 deletions...ache/spark/storage/ShuffleBlockFetcherIteratorSuite.scala
- docs/configuration.md 9 additions, 0 deletionsdocs/configuration.md
Loading
Please register or sign in to comment