-
- Downloads
[SPARK-19110][ML][MLLIB] DistributedLDAModel returns different logPrior for...
[SPARK-19110][ML][MLLIB] DistributedLDAModel returns different logPrior for original and loaded model ## What changes were proposed in this pull request? While adding DistributedLDAModel training summary for SparkR, I found that the logPrior for original and loaded model is different. For example, in the test("read/write DistributedLDAModel"), I add the test: val logPrior = model.asInstanceOf[DistributedLDAModel].logPrior val logPrior2 = model2.asInstanceOf[DistributedLDAModel].logPrior assert(logPrior === logPrior2) The test fails: -4.394180878889078 did not equal -4.294290536919573 The reason is that `graph.vertices.aggregate(0.0)(seqOp, _ + _)` only returns the value of a single vertex instead of the aggregation of all vertices. Therefore, when the loaded model does the aggregation in a different order, it returns different `logPrior`. Please refer to #16464 for details. ## How was this patch tested? Add a new unit test for testing logPrior. Author: wm624@hotmail.com <wm624@hotmail.com> Closes #16491 from wangmiao1981/ldabug.
Showing
- mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala 2 additions, 2 deletions...in/scala/org/apache/spark/mllib/clustering/LDAModel.scala
- mllib/src/test/scala/org/apache/spark/ml/clustering/LDASuite.scala 8 additions, 0 deletions.../test/scala/org/apache/spark/ml/clustering/LDASuite.scala
Loading
Please register or sign in to comment