-
- Downloads
[SPARK-21365][PYTHON] Deduplicate logics parsing DDL type/schema definition
## What changes were proposed in this pull request? This PR deals with four points as below: - Reuse existing DDL parser APIs rather than reimplementing within PySpark - Support DDL formatted string, `field type, field type`. - Support case-insensitivity for parsing. - Support nested data types as below: **Before** ``` >>> spark.createDataFrame([[[1]]], "struct<a: struct<b: int>>").show() ... ValueError: The strcut field string format is: 'field_name:field_type', but got: a: struct<b: int> ``` ``` >>> spark.createDataFrame([[[1]]], "a: struct<b: int>").show() ... ValueError: The strcut field string format is: 'field_name:field_type', but got: a: struct<b: int> ``` ``` >>> spark.createDataFrame([[1]], "a int").show() ... ValueError: Could not parse datatype: a int ``` **After** ``` >>> spark.createDataFrame([[[1]]], "struct<a: struct<b: int>>").show() +---+ | a| +---+ |[1]| +---+ ``` ``` >>> spark.createDataFrame([[[1]]], "a: struct<b: int>").show() +---+ | a| +---+ |[1]| +---+ ``` ``` >>> spark.createDataFrame([[1]], "a int").show() +---+ | a| +---+ | 1| +---+ ``` ## How was this patch tested? Author: hyukjinkwon <gurwls223@gmail.com> Closes #18590 from HyukjinKwon/deduplicate-python-ddl.
Showing
- python/pyspark/sql/functions.py 13 additions, 3 deletionspython/pyspark/sql/functions.py
- python/pyspark/sql/tests.py 25 additions, 0 deletionspython/pyspark/sql/tests.py
- python/pyspark/sql/types.py 34 additions, 54 deletionspython/pyspark/sql/types.py
- sql/core/src/main/scala/org/apache/spark/sql/api/python/PythonSQLUtils.scala 25 additions, 0 deletions...cala/org/apache/spark/sql/api/python/PythonSQLUtils.scala
Loading
Please register or sign in to comment