-
- Downloads
[SPARK-3595] Respect configured OutputCommitters when calling saveAsHadoopFile
Addresses the issue in https://issues.apache.org/jira/browse/SPARK-3595, namely saveAsHadoopFile hardcoding the OutputCommitter. This is not ideal when running Spark jobs that write to S3, especially when running them from an EMR cluster where the default OutputCommitter is a DirectOutputCommitter. Author: Ian Hummel <ian@themodernlife.net> Closes #2450 from themodernlife/spark-3595 and squashes the following commits: f37a0e5 [Ian Hummel] Update based on comments from pwendell a11d9f3 [Ian Hummel] Fix formatting 4359664 [Ian Hummel] Add an example showing usage 8b6be94 [Ian Hummel] Add ability to specify OutputCommitter, espcially useful when writing to an S3 bucket from an EMR cluster
Showing
- core/src/main/scala/org/apache/spark/SparkHadoopWriter.scala 1 addition, 1 deletioncore/src/main/scala/org/apache/spark/SparkHadoopWriter.scala
- core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala 6 additions, 1 deletion...rc/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
- core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala 84 additions, 23 deletions...st/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala
Please register or sign in to comment