-
- Downloads
[SPARK-18294][CORE] Implement commit protocol to support `mapred` package's committer
## What changes were proposed in this pull request? This PR makes the following changes: - Implement a new commit protocol `HadoopMapRedCommitProtocol` which support the old `mapred` package's committer; - Refactor SparkHadoopWriter and SparkHadoopMapReduceWriter, now they are combined together, thus we can support write through both mapred and mapreduce API by the new SparkHadoopWriter, a lot of duplicated codes are removed. After this change, it should be pretty easy for us to support the committer from both the new and the old hadoop API at high level. ## How was this patch tested? No major behavior change, passed the existing test cases. Author: Xingbo Jiang <xingbo.jiang@databricks.com> Closes #18438 from jiangxb1987/SparkHadoopWriter.
Showing
- core/src/main/scala/org/apache/spark/internal/io/HadoopMapRedCommitProtocol.scala 36 additions, 0 deletions...apache/spark/internal/io/HadoopMapRedCommitProtocol.scala
- core/src/main/scala/org/apache/spark/internal/io/HadoopWriteConfigUtil.scala 79 additions, 0 deletions.../org/apache/spark/internal/io/HadoopWriteConfigUtil.scala
- core/src/main/scala/org/apache/spark/internal/io/SparkHadoopMapReduceWriter.scala 0 additions, 181 deletions...apache/spark/internal/io/SparkHadoopMapReduceWriter.scala
- core/src/main/scala/org/apache/spark/internal/io/SparkHadoopWriter.scala 312 additions, 81 deletions...cala/org/apache/spark/internal/io/SparkHadoopWriter.scala
- core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala 9 additions, 63 deletions...rc/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
- core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala 1 addition, 1 deletion...st/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala
- core/src/test/scala/org/apache/spark/scheduler/OutputCommitCoordinatorSuite.scala 24 additions, 11 deletions...apache/spark/scheduler/OutputCommitCoordinatorSuite.scala
Please register or sign in to comment