Skip to content
Snippets Groups Projects
Commit dd8514fa authored by Xusen Yin's avatar Xusen Yin Committed by Yanbo Liang
Browse files

[SPARK-16558][EXAMPLES][MLLIB] examples/mllib/LDAExample should use MLVector...

[SPARK-16558][EXAMPLES][MLLIB] examples/mllib/LDAExample should use MLVector instead of MLlib Vector

## What changes were proposed in this pull request?

mllib.LDAExample uses ML pipeline and MLlib LDA algorithm. The former transforms original data into MLVector format, while the latter uses MLlibVector format.

## How was this patch tested?

Test manually.

Author: Xusen Yin <yinxusen@gmail.com>

Closes #14212 from yinxusen/SPARK-16558.
parent d9e0919d
No related branches found
No related tags found
No related merge requests found
......@@ -24,8 +24,9 @@ import scopt.OptionParser
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.ml.Pipeline
import org.apache.spark.ml.feature.{CountVectorizer, CountVectorizerModel, RegexTokenizer, StopWordsRemover}
import org.apache.spark.ml.linalg.{Vector => MLVector}
import org.apache.spark.mllib.clustering.{DistributedLDAModel, EMLDAOptimizer, LDA, OnlineLDAOptimizer}
import org.apache.spark.mllib.linalg.Vector
import org.apache.spark.mllib.linalg.{Vector, Vectors}
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.{Row, SparkSession}
......@@ -223,7 +224,7 @@ object LDAExample {
val documents = model.transform(df)
.select("features")
.rdd
.map { case Row(features: Vector) => features }
.map { case Row(features: MLVector) => Vectors.fromML(features) }
.zipWithIndex()
.map(_.swap)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment