-
- Downloads
[SPARK-18191][CORE] Port RDD API to use commit protocol
## What changes were proposed in this pull request? This PR port RDD API to use commit protocol, the changes made here: 1. Add new internal helper class that saves an RDD using a Hadoop OutputFormat named `SparkNewHadoopWriter`, it's similar with `SparkHadoopWriter` but uses commit protocol. This class supports the newer `mapreduce` API, instead of the old `mapred` API which is supported by `SparkHadoopWriter`; 2. Rewrite `PairRDDFunctions.saveAsNewAPIHadoopDataset` function, so it uses commit protocol now. ## How was this patch tested? Exsiting test cases. Author: jiangxingbo <jiangxb1987@gmail.com> Closes #15769 from jiangxb1987/rdd-commit.
Showing
- core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala 2 additions, 23 deletionscore/src/main/scala/org/apache/spark/SparkHadoopWriter.scala
- core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala 4 additions, 2 deletions...che/spark/internal/io/HadoopMapReduceCommitProtocol.scala
- core/src/main/scala/org/apache/spark/internal/io/SparkHadoopMapReduceWriter.scala 249 additions, 0 deletions...apache/spark/internal/io/SparkHadoopMapReduceWriter.scala
- core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala 13 additions, 126 deletions...rc/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
- core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala 2 additions, 18 deletions...st/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala 2 additions, 2 deletions...he/spark/sql/execution/datasources/FileFormatWriter.scala
- sql/hive/src/main/scala/org/apache/spark/sql/hive/hiveWriterContainers.scala 2 additions, 1 deletion...cala/org/apache/spark/sql/hive/hiveWriterContainers.scala
- streaming/src/main/scala/org/apache/spark/streaming/dstream/DStream.scala 3 additions, 2 deletions...in/scala/org/apache/spark/streaming/dstream/DStream.scala
- streaming/src/main/scala/org/apache/spark/streaming/scheduler/JobScheduler.scala 3 additions, 2 deletions...a/org/apache/spark/streaming/scheduler/JobScheduler.scala
Loading
Please register or sign in to comment