-
- Downloads
[SPARK-18960][SQL][SS] Avoid double reading file which is being copied.
## What changes were proposed in this pull request? In HDFS, when we copy a file into target directory, there will a temporary `._COPY_` file for a period of time. The duration depends on file size. If we do not skip this file, we will may read the same data for two times. ## How was this patch tested? update unit test Author: uncleGen <hustyugm@gmail.com> Closes #16370 from uncleGen/SPARK-18960.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningAwareFileIndex.scala 8 additions, 3 deletions...ql/execution/datasources/PartitioningAwareFileIndex.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileIndexSuite.scala 1 addition, 0 deletions...ache/spark/sql/execution/datasources/FileIndexSuite.scala
Please register or sign in to comment