gatorsmile
authored
[SPARK-19092][SQL][BACKPORT-2.1] Save() API of DataFrameWriter should not scan all the saved files #16481 ### What changes were proposed in this pull request? #### This PR is to backport https://github.com/apache/spark/pull/16481 to Spark 2.1 --- `DataFrameWriter`'s [save() API](https://github.com/gatorsmile/spark/blob/5d38f09f47a767a342a0a8219c63efa2943b5d1f/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L207) is performing a unnecessary full filesystem scan for the saved files. The save() API is the most basic/core API in `DataFrameWriter`. We should avoid it. ### How was this patch tested? Added and modified the test cases Author: gatorsmile <gatorsmile@gmail.com> Closes #16588 from gatorsmile/backport-19092.
Name | Last commit | Last update |
---|---|---|
.. | ||
compatibility/src/test/scala/org/apache/spark/sql/hive/execution | ||
src | ||
pom.xml |