Skip to content
Snippets Groups Projects
Commit b3dd569a authored by Yin Huai's avatar Yin Huai
Browse files

[SPARK-10287] [SQL] Fixes JSONRelation refreshing on read path

https://issues.apache.org/jira/browse/SPARK-10287

After porting json to HadoopFsRelation, it seems hard to keep the behavior of picking up new files automatically for JSON. This PR removes this behavior, so JSON is consistent with others (ORC and Parquet).

Author: Yin Huai <yhuai@databricks.com>

Closes #8469 from yhuai/jsonRefresh.
parent 5bfe9e11
No related branches found
No related tags found
No related merge requests found
...@@ -2057,6 +2057,12 @@ options. ...@@ -2057,6 +2057,12 @@ options.
- The canonical name of SQL/DataFrame functions are now lower case (e.g. sum vs SUM). - The canonical name of SQL/DataFrame functions are now lower case (e.g. sum vs SUM).
- It has been determined that using the DirectOutputCommitter when speculation is enabled is unsafe - It has been determined that using the DirectOutputCommitter when speculation is enabled is unsafe
and thus this output committer will not be used when speculation is on, independent of configuration. and thus this output committer will not be used when speculation is on, independent of configuration.
- JSON data source will not automatically load new files that are created by other applications
(i.e. files that are not inserted to the dataset through Spark SQL).
For a JSON persistent table (i.e. the metadata of the table is stored in Hive Metastore),
users can use `REFRESH TABLE` SQL command or `HiveContext`'s `refreshTable` method
to include those new files to the table. For a DataFrame representing a JSON dataset, users need to recreate
the DataFrame and the new DataFrame will include new files.
## Upgrading from Spark SQL 1.3 to 1.4 ## Upgrading from Spark SQL 1.3 to 1.4
......
...@@ -111,15 +111,6 @@ private[sql] class JSONRelation( ...@@ -111,15 +111,6 @@ private[sql] class JSONRelation(
jsonSchema jsonSchema
} }
override private[sql] def buildScan(
requiredColumns: Array[String],
filters: Array[Filter],
inputPaths: Array[String],
broadcastedConf: Broadcast[SerializableConfiguration]): RDD[Row] = {
refresh()
super.buildScan(requiredColumns, filters, inputPaths, broadcastedConf)
}
override def buildScan( override def buildScan(
requiredColumns: Array[String], requiredColumns: Array[String],
filters: Array[Filter], filters: Array[Filter],
......
...@@ -562,7 +562,7 @@ abstract class HadoopFsRelation private[sql](maybePartitionSpec: Option[Partitio ...@@ -562,7 +562,7 @@ abstract class HadoopFsRelation private[sql](maybePartitionSpec: Option[Partitio
}) })
} }
private[sql] def buildScan( final private[sql] def buildScan(
requiredColumns: Array[String], requiredColumns: Array[String],
filters: Array[Filter], filters: Array[Filter],
inputPaths: Array[String], inputPaths: Array[String],
......
...@@ -167,21 +167,6 @@ class InsertSuite extends DataSourceTest with SharedSQLContext { ...@@ -167,21 +167,6 @@ class InsertSuite extends DataSourceTest with SharedSQLContext {
) )
} }
test("save directly to the path of a JSON table") {
caseInsensitiveContext.table("jt").selectExpr("a * 5 as a", "b")
.write.mode(SaveMode.Overwrite).json(path.toString)
checkAnswer(
sql("SELECT a, b FROM jsonTable"),
(1 to 10).map(i => Row(i * 5, s"str$i"))
)
caseInsensitiveContext.table("jt").write.mode(SaveMode.Overwrite).json(path.toString)
checkAnswer(
sql("SELECT a, b FROM jsonTable"),
(1 to 10).map(i => Row(i, s"str$i"))
)
}
test("it is not allowed to write to a table while querying it.") { test("it is not allowed to write to a table while querying it.") {
val message = intercept[AnalysisException] { val message = intercept[AnalysisException] {
sql( sql(
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment