-
- Downloads
[SPARK-20801] Record accurate size of blocks in MapStatus when it's above threshold.
## What changes were proposed in this pull request? Currently, when number of reduces is above 2000, HighlyCompressedMapStatus is used to store size of blocks. in HighlyCompressedMapStatus, only average size is stored for non empty blocks. Which is not good for memory control when we shuffle blocks. It makes sense to store the accurate size of block when it's above threshold. ## How was this patch tested? Added test in MapStatusSuite. Author: jinxing <jinxing6042@126.com> Closes #18031 from jinxing64/SPARK-20801.
Showing
- core/src/main/scala/org/apache/spark/internal/config/package.scala 9 additions, 0 deletions...main/scala/org/apache/spark/internal/config/package.scala
- core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala 45 additions, 9 deletions...src/main/scala/org/apache/spark/scheduler/MapStatus.scala
- core/src/test/scala/org/apache/spark/scheduler/MapStatusSuite.scala 27 additions, 1 deletion...est/scala/org/apache/spark/scheduler/MapStatusSuite.scala
- docs/configuration.md 9 additions, 0 deletionsdocs/configuration.md
Loading
Please register or sign in to comment