-
- Downloads
[SPARK-19918][SQL] Use TextFileFormat in implementation of TextInputJsonDataSource
## What changes were proposed in this pull request? This PR proposes to use text datasource when Json schema inference. This basically proposes the similar approach in https://github.com/apache/spark/pull/15813 If we use Dataset for initial loading when inferring the schema, there are advantages. Please refer SPARK-18362 It seems JSON one was supposed to be fixed together but taken out according to https://github.com/apache/spark/pull/15813 > A similar problem also affects the JSON file format and this patch originally fixed that as well, but I've decided to split that change into a separate patch so as not to conflict with changes in another JSON PR. Also, this seems affecting some functionalities because it does not use `FileScanRDD`. This problem is described in SPARK-19885 (but it was CSV's case). ## How was this patch tested? Existing tests should cover this and manual test by `spark.read.json(path)` and check the UI. Author: hyukjinkwon <gurwls223@gmail.com> Closes #17255 from HyukjinKwon/json-filescanrdd.
Showing
- sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala 3 additions, 6 deletions...src/main/scala/org/apache/spark/sql/DataFrameReader.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonDataSource.scala 66 additions, 79 deletions...spark/sql/execution/datasources/json/JsonDataSource.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonFileFormat.scala 1 addition, 1 deletion...spark/sql/execution/datasources/json/JsonFileFormat.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonInferSchema.scala 1 addition, 8 deletions...park/sql/execution/datasources/json/JsonInferSchema.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JsonUtils.scala 51 additions, 0 deletions...ache/spark/sql/execution/datasources/json/JsonUtils.scala
Loading
Please register or sign in to comment