Skip to content
Snippets Groups Projects
Commit a6e53a9c authored by Meihua Wu's avatar Meihua Wu Committed by Xiangrui Meng
Browse files

[SPARK-9225] [MLLIB] LDASuite needs unit tests for empty documents

Add unit tests for running LDA with empty documents.
Both EMLDAOptimizer and OnlineLDAOptimizer are tested.

feynmanliang

Author: Meihua Wu <meihuawu@umich.edu>

Closes #7620 from rotationsymmetry/SPARK-9225 and squashes the following commits:

3ed7c88 [Meihua Wu] Incorporate reviewer's further comments
f9432e8 [Meihua Wu] Incorporate reviewer's comments
8e1b9ec [Meihua Wu] Merge remote-tracking branch 'upstream/master' into SPARK-9225
ad55665 [Meihua Wu] Add unit tests for running LDA with empty documents
parent 9c0501c5
No related branches found
No related tags found
No related merge requests found
...@@ -390,6 +390,46 @@ class LDASuite extends SparkFunSuite with MLlibTestSparkContext { ...@@ -390,6 +390,46 @@ class LDASuite extends SparkFunSuite with MLlibTestSparkContext {
} }
} }
test("EMLDAOptimizer with empty docs") {
val vocabSize = 6
val emptyDocsArray = Array.fill(6)(Vectors.sparse(vocabSize, Array.empty, Array.empty))
val emptyDocs = emptyDocsArray
.zipWithIndex.map { case (wordCounts, docId) =>
(docId.toLong, wordCounts)
}
val distributedEmptyDocs = sc.parallelize(emptyDocs, 2)
val op = new EMLDAOptimizer()
val lda = new LDA()
.setK(3)
.setMaxIterations(5)
.setSeed(12345)
.setOptimizer(op)
val model = lda.run(distributedEmptyDocs)
assert(model.vocabSize === vocabSize)
}
test("OnlineLDAOptimizer with empty docs") {
val vocabSize = 6
val emptyDocsArray = Array.fill(6)(Vectors.sparse(vocabSize, Array.empty, Array.empty))
val emptyDocs = emptyDocsArray
.zipWithIndex.map { case (wordCounts, docId) =>
(docId.toLong, wordCounts)
}
val distributedEmptyDocs = sc.parallelize(emptyDocs, 2)
val op = new OnlineLDAOptimizer()
val lda = new LDA()
.setK(3)
.setMaxIterations(5)
.setSeed(12345)
.setOptimizer(op)
val model = lda.run(distributedEmptyDocs)
assert(model.vocabSize === vocabSize)
}
} }
private[clustering] object LDASuite { private[clustering] object LDASuite {
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment