-
- Downloads
[SPARK-3573][MLLIB] Make MLlib's Vector compatible with SQL's SchemaRDD
Register MLlib's Vector as a SQL user-defined type (UDT) in both Scala and Python. With this PR, we can easily map a RDD[LabeledPoint] to a SchemaRDD, and then select columns or save to a Parquet file. Examples in Scala/Python are attached. The Scala code was copied from jkbradley. ~~This PR contains the changes from #3068 . I will rebase after #3068 is merged.~~ marmbrus jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #3070 from mengxr/SPARK-3573 and squashes the following commits: 3a0b6e5 [Xiangrui Meng] organize imports 236f0a0 [Xiangrui Meng] register vector as UDT and provide dataset examples
Showing
- dev/run-tests 1 addition, 1 deletiondev/run-tests
- examples/src/main/python/mllib/dataset_example.py 62 additions, 0 deletionsexamples/src/main/python/mllib/dataset_example.py
- examples/src/main/scala/org/apache/spark/examples/mllib/DatasetExample.scala 121 additions, 0 deletions...cala/org/apache/spark/examples/mllib/DatasetExample.scala
- mllib/pom.xml 5 additions, 0 deletionsmllib/pom.xml
- mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala 67 additions, 2 deletions...rc/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
- mllib/src/test/scala/org/apache/spark/mllib/linalg/VectorsSuite.scala 11 additions, 0 deletions...st/scala/org/apache/spark/mllib/linalg/VectorsSuite.scala
- python/pyspark/mllib/linalg.py 50 additions, 0 deletionspython/pyspark/mllib/linalg.py
- python/pyspark/mllib/tests.py 36 additions, 3 deletionspython/pyspark/mllib/tests.py
Loading
Please register or sign in to comment