Skip to content
Snippets Groups Projects
Commit c48f2a3a authored by Sean Owen's avatar Sean Owen
Browse files

[SPARK-7615][MLLIB] MLLIB Word2Vec wordVectors divided by Euclidean Norm equals to zero

Cosine similarity with 0 vector should be 0

Related to https://github.com/apache/spark/pull/10152

Author: Sean Owen <sowen@cloudera.com>

Closes #10696 from srowen/SPARK-7615.
parent 8cfa218f
No related branches found
No related tags found
No related merge requests found
......@@ -543,7 +543,12 @@ class Word2VecModel private[spark] (
val cosVec = cosineVec.map(_.toDouble)
var ind = 0
while (ind < numWords) {
cosVec(ind) /= wordVecNorms(ind)
val norm = wordVecNorms(ind)
if (norm == 0.0) {
cosVec(ind) = 0.0
} else {
cosVec(ind) /= norm
}
ind += 1
}
wordList.zip(cosVec)
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment