-
- Downloads
[SPARK-10063][SQL] Remove DirectParquetOutputCommitter
## What changes were proposed in this pull request? This patch removes DirectParquetOutputCommitter. This was initially created by Databricks as a faster way to write Parquet data to S3. However, given how the underlying S3 Hadoop implementation works, this committer only works when there are no failures. If there are multiple attempts of the same task (e.g. speculation or task failures or node failures), the output data can be corrupted. I don't think this performance optimization outweighs the correctness issue. ## How was this patch tested? Removed the related tests also. Author: Reynold Xin <rxin@databricks.com> Closes #12229 from rxin/SPARK-10063.
Showing
- docs/sql-programming-guide.md 0 additions, 33 deletionsdocs/sql-programming-guide.md
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriterContainer.scala 5 additions, 13 deletions...che/spark/sql/execution/datasources/WriterContainer.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/DirectParquetOutputCommitter.scala 0 additions, 88 deletions...on/datasources/parquet/DirectParquetOutputCommitter.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRelation.scala 0 additions, 7 deletions...k/sql/execution/datasources/parquet/ParquetRelation.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala 0 additions, 49 deletions...rk/sql/execution/datasources/parquet/ParquetIOSuite.scala
- sql/hive/src/test/scala/org/apache/spark/sql/sources/hadoopFsRelationSuites.scala 0 additions, 34 deletions...org/apache/spark/sql/sources/hadoopFsRelationSuites.scala
Loading
Please register or sign in to comment