-
- Downloads
[SPARK-9743] [SQL] Fixes JSONRelation refreshing
PR #7696 added two `HadoopFsRelation.refresh()` calls ([this] [1], and [this] [2]) in `DataSourceStrategy` to make test case `InsertSuite.save directly to the path of a JSON table` pass. However, this forces every `HadoopFsRelation` table scan to do a refresh, which can be super expensive for tables with large number of partitions. The reason why the original test case fails without the `refresh()` calls is that, the old JSON relation builds the base RDD with the input paths, while `HadoopFsRelation` provides `FileStatus`es of leaf files. With the old JSON relation, we can create a temporary table based on a path, writing data to that, and then read newly written data without refreshing the table. This is no long true for `HadoopFsRelation`. This PR removes those two expensive refresh calls, and moves the refresh into `JSONRelation` to fix this issue. We might want to update `HadoopFsRelation` interface to provide better support for this use case. [1]: https://github.com/apache/spark/blob/ebfd91c542aaead343cb154277fcf9114382fee7/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L63 [2]: https://github.com/apache/spark/blob/ebfd91c542aaead343cb154277fcf9114382fee7/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala#L91 Author: Cheng Lian <lian@databricks.com> Closes #8035 from liancheng/spark-9743/fix-json-relation-refreshing and squashes the following commits: ec1957d [Cheng Lian] Fixes JSONRelation refreshing
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSourceStrategy.scala 0 additions, 2 deletions.../spark/sql/execution/datasources/DataSourceStrategy.scala
- sql/core/src/main/scala/org/apache/spark/sql/json/JSONRelation.scala 15 additions, 4 deletions...c/main/scala/org/apache/spark/sql/json/JSONRelation.scala
- sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala 1 addition, 1 deletion.../main/scala/org/apache/spark/sql/sources/interfaces.scala
- sql/core/src/test/scala/org/apache/spark/sql/sources/InsertSuite.scala 5 additions, 5 deletions...test/scala/org/apache/spark/sql/sources/InsertSuite.scala
Loading
Please register or sign in to comment