Skip to content
Snippets Groups Projects
  • Xiangrui Meng's avatar
    1a9c6cdd
    [SPARK-3573][MLLIB] Make MLlib's Vector compatible with SQL's SchemaRDD · 1a9c6cdd
    Xiangrui Meng authored
    Register MLlib's Vector as a SQL user-defined type (UDT) in both Scala and Python. With this PR, we can easily map a RDD[LabeledPoint] to a SchemaRDD, and then select columns or save to a Parquet file. Examples in Scala/Python are attached. The Scala code was copied from jkbradley.
    
    ~~This PR contains the changes from #3068 . I will rebase after #3068 is merged.~~
    
    marmbrus jkbradley
    
    Author: Xiangrui Meng <meng@databricks.com>
    
    Closes #3070 from mengxr/SPARK-3573 and squashes the following commits:
    
    3a0b6e5 [Xiangrui Meng] organize imports
    236f0a0 [Xiangrui Meng] register vector as UDT and provide dataset examples
    1a9c6cdd
    History
    [SPARK-3573][MLLIB] Make MLlib's Vector compatible with SQL's SchemaRDD
    Xiangrui Meng authored
    Register MLlib's Vector as a SQL user-defined type (UDT) in both Scala and Python. With this PR, we can easily map a RDD[LabeledPoint] to a SchemaRDD, and then select columns or save to a Parquet file. Examples in Scala/Python are attached. The Scala code was copied from jkbradley.
    
    ~~This PR contains the changes from #3068 . I will rebase after #3068 is merged.~~
    
    marmbrus jkbradley
    
    Author: Xiangrui Meng <meng@databricks.com>
    
    Closes #3070 from mengxr/SPARK-3573 and squashes the following commits:
    
    3a0b6e5 [Xiangrui Meng] organize imports
    236f0a0 [Xiangrui Meng] register vector as UDT and provide dataset examples