Skip to content
Snippets Groups Projects
Commit 8fb2a02e authored by hyukjinkwon's avatar hyukjinkwon Committed by Wenchen Fan
Browse files

[SPARK-19918][SQL] Use TextFileFormat in implementation of TextInputJsonDataSource

## What changes were proposed in this pull request?

This PR proposes to use text datasource when Json schema inference.

This basically proposes the similar approach in https://github.com/apache/spark/pull/15813 If we use Dataset for initial loading when inferring the schema, there are advantages. Please refer SPARK-18362

It seems JSON one was supposed to be fixed together but taken out according to https://github.com/apache/spark/pull/15813

> A similar problem also affects the JSON file format and this patch originally fixed that as well, but I've decided to split that change into a separate patch so as not to conflict with changes in another JSON PR.

Also, this seems affecting some functionalities because it does not use `FileScanRDD`. This problem is described in SPARK-19885 (but it was CSV's case).

## How was this patch tested?

Existing tests should cover this and manual test by `spark.read.json(path)` and check the UI.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #17255 from HyukjinKwon/json-filescanrdd.
parent dacc382f
No related branches found
No related tags found
No related merge requests found
Loading
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment