[SPARK-16542][SQL][PYSPARK] Fix bugs about types that result an array of null...
[SPARK-16542][SQL][PYSPARK] Fix bugs about types that result an array of null when creating DataFrame using python ## What changes were proposed in this pull request? This is the reopen of https://github.com/apache/spark/pull/14198, with merge conflicts resolved. ueshin Could you please take a look at my code? Fix bugs about types that result an array of null when creating DataFrame using python. Python's array.array have richer type than python itself, e.g. we can have `array('f',[1,2,3])` and `array('d',[1,2,3])`. Codes in spark-sql and pyspark didn't take this into consideration which might cause a problem that you get an array of null values when you have `array('f')` in your rows. A simple code to reproduce this bug is: ``` from pyspark import SparkContext from pyspark.sql import SQLContext,Row,DataFrame from array import array sc = SparkContext() sqlContext = SQLContext(sc) row1 = Row(floatarray=array('f',[1,2,3]), doublearray=array('d',[1,2,3])) rows = sc.parallelize([ row1 ]) df = sqlContext.createDataFrame(rows) df.show() ``` which have output ``` +---------------+------------------+ | doublearray| floatarray| +---------------+------------------+ |[1.0, 2.0, 3.0]|[null, null, null]| +---------------+------------------+ ``` ## How was this patch tested? New test case added Author: Xiang Gao <qasdfgtyuiop@gmail.com> Author: Gao, Xiang <qasdfgtyuiop@gmail.com> Author: Takuya UESHIN <ueshin@databricks.com> Closes #18444 from zasdfgbnm/fix_array_infer.
Showing
- core/src/main/scala/org/apache/spark/api/python/SerDeUtil.scala 16 additions, 4 deletions...rc/main/scala/org/apache/spark/api/python/SerDeUtil.scala
- python/pyspark/sql/tests.py 96 additions, 1 deletionpython/pyspark/sql/tests.py
- python/pyspark/sql/types.py 94 additions, 1 deletionpython/pyspark/sql/types.py
- sql/core/src/main/scala/org/apache/spark/sql/execution/python/EvaluatePython.scala 10 additions, 0 deletions...rg/apache/spark/sql/execution/python/EvaluatePython.scala
Loading
Please register or sign in to comment