-
- Downloads
[SPARK-19092][SQL] Save() API of DataFrameWriter should not scan all the saved files
### What changes were proposed in this pull request? `DataFrameWriter`'s [save() API](https://github.com/gatorsmile/spark/blob/5d38f09f47a767a342a0a8219c63efa2943b5d1f/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L207) is performing a unnecessary full filesystem scan for the saved files. The save() API is the most basic/core API in `DataFrameWriter`. We should avoid it. The related PR: https://github.com/apache/spark/pull/16090 ### How was this patch tested? Updated the existing test cases. Author: gatorsmile <gatorsmile@gmail.com> Closes #16481 from gatorsmile/saveFileScan.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala 1 addition, 1 deletion.../spark/sql/execution/command/createDataSourceTables.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala 98 additions, 74 deletions...g/apache/spark/sql/execution/datasources/DataSource.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/PartitionedTablePerfStatsSuite.scala 7 additions, 22 deletions...pache/spark/sql/hive/PartitionedTablePerfStatsSuite.scala
Loading
Please register or sign in to comment