-
- Downloads
[SPARK-4611][MLlib] Implement the efficient vector norm
The vector norm in breeze is implemented by `activeIterator` which is known to be very slow. In this PR, an efficient vector norm is implemented, and with this API, `Normalizer` and `k-means` have big performance improvement. Here is the benchmark against mnist8m dataset. a) `Normalizer` Before DenseVector: 68.25secs SparseVector: 17.01secs With this PR DenseVector: 12.71secs SparseVector: 2.73secs b) `k-means` Before DenseVector: 83.46secs SparseVector: 61.60secs With this PR DenseVector: 70.04secs SparseVector: 59.05secs Author: DB Tsai <dbtsai@alpinenow.com> Closes #3462 from dbtsai/norm and squashes the following commits: 63c7165 [DB Tsai] typo 0c3637f [DB Tsai] add import org.apache.spark.SparkContext._ back 6fa616c [DB Tsai] address feedback 9b7cb56 [DB Tsai] move norm to static method 0b632e6 [DB Tsai] kmeans dbed124 [DB Tsai] style c1a877c [DB Tsai] first commit
Showing
- mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala 3 additions, 3 deletions...main/scala/org/apache/spark/mllib/clustering/KMeans.scala
- mllib/src/main/scala/org/apache/spark/mllib/feature/Normalizer.scala 1 addition, 3 deletions...ain/scala/org/apache/spark/mllib/feature/Normalizer.scala
- mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala 51 additions, 0 deletions...rc/main/scala/org/apache/spark/mllib/linalg/Vectors.scala
- mllib/src/test/scala/org/apache/spark/mllib/linalg/VectorsSuite.scala 24 additions, 0 deletions...st/scala/org/apache/spark/mllib/linalg/VectorsSuite.scala
Loading
Please register or sign in to comment