-
- Downloads
[SPARK-14231] [SQL] JSON data source infers floating-point values as a double...
[SPARK-14231] [SQL] JSON data source infers floating-point values as a double when they do not fit in a decimal ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-14231 Currently, JSON data source supports to infer `DecimalType` for big numbers and `floatAsBigDecimal` option which reads floating-point values as `DecimalType`. But there are few restrictions in Spark `DecimalType` below: 1. The precision cannot be bigger than 38. 2. scale cannot be bigger than precision. Currently, both restrictions are not being handled. This PR handles the cases by inferring them as `DoubleType`. Also, the option name was changed from `floatAsBigDecimal` to `prefersDecimal` as suggested [here](https://issues.apache.org/jira/browse/SPARK-14231?focusedCommentId=15215579&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15215579). So, the codes below: ```scala def doubleRecords: RDD[String] = sqlContext.sparkContext.parallelize( s"""{"a": 1${"0" * 38}, "b": 0.01}""" :: s"""{"a": 2${"0" * 38}, "b": 0.02}""" :: Nil) val jsonDF = sqlContext.read .option("prefersDecimal", "true") .json(doubleRecords) jsonDF.printSchema() ``` produces below: - **Before** ```scala org.apache.spark.sql.AnalysisException: Decimal scale (2) cannot be greater than precision (1).; at org.apache.spark.sql.types.DecimalType.<init>(DecimalType.scala:44) at org.apache.spark.sql.execution.datasources.json.InferSchema$.org$apache$spark$sql$execution$datasources$json$InferSchema$$inferField(InferSchema.scala:144) at org.apache.spark.sql.execution.datasources.json.InferSchema$.org$apache$spark$sql$execution$datasources$json$InferSchema$$inferField(InferSchema.scala:108) at ... ``` - **After** ```scala root |-- a: double (nullable = true) |-- b: double (nullable = true) ``` ## How was this patch tested? Unit tests were used and `./dev/run_tests` for coding style tests. Author: hyukjinkwon <gurwls223@gmail.com> Closes #12030 from HyukjinKwon/SPARK-14231.
Showing
- python/pyspark/sql/readwriter.py 2 additions, 2 deletionspython/pyspark/sql/readwriter.py
- sql/core/src/main/scala/org/apache/spark/sql/DataFrameReader.scala 2 additions, 2 deletions...src/main/scala/org/apache/spark/sql/DataFrameReader.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/InferSchema.scala 11 additions, 6 deletions...he/spark/sql/execution/datasources/json/InferSchema.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JSONOptions.scala 2 additions, 2 deletions...he/spark/sql/execution/datasources/json/JSONOptions.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonSuite.scala 46 additions, 2 deletions...ache/spark/sql/execution/datasources/json/JsonSuite.scala
- sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/TestJsonData.scala 8 additions, 0 deletions...e/spark/sql/execution/datasources/json/TestJsonData.scala
Loading
Please register or sign in to comment