-
- Downloads
[SPARK-18661][SQL] Creating a partitioned datasource table should not scan all files for table
## What changes were proposed in this pull request? Even though in 2.1 creating a partitioned datasource table will not populate the partition data by default (until the user issues MSCK REPAIR TABLE), it seems we still scan the filesystem for no good reason. We should avoid doing this when the user specifies a schema. ## How was this patch tested? Perf stat tests. Author: Eric Liang <ekl@databricks.com> Closes #16090 from ericl/spark-18661. (cherry picked from commit d9eb4c72) Signed-off-by:Wenchen Fan <wenchen@databricks.com>
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala 9 additions, 1 deletion.../spark/sql/execution/command/createDataSourceTables.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala 1 addition, 1 deletion...g/apache/spark/sql/execution/datasources/DataSource.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/command/DDLSuite.scala 9 additions, 2 deletions...ala/org/apache/spark/sql/execution/command/DDLSuite.scala
- sql/hive/src/test/scala/org/apache/spark/sql/hive/PartitionedTablePerfStatsSuite.scala 47 additions, 4 deletions...pache/spark/sql/hive/PartitionedTablePerfStatsSuite.scala
Loading
Please register or sign in to comment