Skip to content
Snippets Groups Projects
  • Xiangrui Meng's avatar
    04450d11
    [SPARK-4192][SQL] Internal API for Python UDT · 04450d11
    Xiangrui Meng authored
    Following #2919, this PR adds Python UDT (for internal use only) with tests under "pyspark.tests". Before `SQLContext.applySchema`, we check whether we need to convert user-type instances into SQL recognizable data. In the current implementation, a Python UDT must be paired with a Scala UDT for serialization on the JVM side. A following PR will add VectorUDT in MLlib for both Scala and Python.
    
    marmbrus jkbradley davies
    
    Author: Xiangrui Meng <meng@databricks.com>
    
    Closes #3068 from mengxr/SPARK-4192-sql and squashes the following commits:
    
    acff637 [Xiangrui Meng] merge master
    dba5ea7 [Xiangrui Meng] only use pyClass for Python UDT output sqlType as well
    2c9d7e4 [Xiangrui Meng] move import to global setup; update needsConversion
    7c4a6a9 [Xiangrui Meng] address comments
    75223db [Xiangrui Meng] minor update
    f740379 [Xiangrui Meng] remove UDT from default imports
    e98d9d0 [Xiangrui Meng] fix py style
    4e84fce [Xiangrui Meng] remove local hive tests and add more tests
    39f19e0 [Xiangrui Meng] add tests
    b7f666d [Xiangrui Meng] add Python UDT
    04450d11
    History
    [SPARK-4192][SQL] Internal API for Python UDT
    Xiangrui Meng authored
    Following #2919, this PR adds Python UDT (for internal use only) with tests under "pyspark.tests". Before `SQLContext.applySchema`, we check whether we need to convert user-type instances into SQL recognizable data. In the current implementation, a Python UDT must be paired with a Scala UDT for serialization on the JVM side. A following PR will add VectorUDT in MLlib for both Scala and Python.
    
    marmbrus jkbradley davies
    
    Author: Xiangrui Meng <meng@databricks.com>
    
    Closes #3068 from mengxr/SPARK-4192-sql and squashes the following commits:
    
    acff637 [Xiangrui Meng] merge master
    dba5ea7 [Xiangrui Meng] only use pyClass for Python UDT output sqlType as well
    2c9d7e4 [Xiangrui Meng] move import to global setup; update needsConversion
    7c4a6a9 [Xiangrui Meng] address comments
    75223db [Xiangrui Meng] minor update
    f740379 [Xiangrui Meng] remove UDT from default imports
    e98d9d0 [Xiangrui Meng] fix py style
    4e84fce [Xiangrui Meng] remove local hive tests and add more tests
    39f19e0 [Xiangrui Meng] add tests
    b7f666d [Xiangrui Meng] add Python UDT
tests.py 68.91 KiB