-
- Downloads
[SPARK-3570] Include time to open files in shuffle write time.
Opening shuffle files can be very significant when the disk is contended, especially when using ext3. While writing data to a file can avoid hitting disk (and instead hit the buffer cache), opening a file always involves writing some metadata about the file to disk, so the open time can be a very significant portion of the shuffle write time. In one job I ran recently, the time to write shuffle data to the file was only 4ms for each task, but the time to open the file was about 100x as long (~400ms). When we add metrics about spilled data (#2504), we should ensure that the file open time is also included there. Author: Kay Ousterhout <kayousterhout@gmail.com> Closes #4550 from kayousterhout/SPARK-3570 and squashes the following commits: ea3a4ae [Kay Ousterhout] Added comment about excluded open time fdc5185 [Kay Ousterhout] Improved comment 42b7e43 [Kay Ousterhout] Fixed parens for nanotime 2423555 [Kay Ousterhout] [SPARK-3570] Include time to open files in shuffle write time.
Showing
- core/src/main/scala/org/apache/spark/shuffle/FileShuffleBlockManager.scala 4 additions, 0 deletions...la/org/apache/spark/shuffle/FileShuffleBlockManager.scala
- core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala 3 additions, 0 deletions...ala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala
- core/src/main/scala/org/apache/spark/util/collection/ExternalSorter.scala 5 additions, 0 deletions...ala/org/apache/spark/util/collection/ExternalSorter.scala
Please register or sign in to comment