-
- Downloads
[SPARK-16696][ML][MLLIB] destroy KMeans bcNewCenters when loop finished and...
[SPARK-16696][ML][MLLIB] destroy KMeans bcNewCenters when loop finished and update code where should release unused broadcast/RDD in proper time ## What changes were proposed in this pull request? update unused broadcast in KMeans/Word2Vec, use destroy(false) to release memory in time. and several place destroy() update to destroy(false) so that it will be async-called, it will better than blocking called. and update bcNewCenters in KMeans to make it destroy in correct time. I use a list to store all historical `bcNewCenters` generated in each loop iteration and delay them to release at the end of loop. fix TODO in `BisectingKMeans.run` "unpersist old indices", Implements the pattern "persist current step RDD, and unpersist previous one" in the loop iteration. ## How was this patch tested? Existing tests. Author: WeichenXu <WeichenXu123@outlook.com> Closes #14333 from WeichenXu123/broadvar_unpersist_to_destroy.
Showing
- mllib/src/main/scala/org/apache/spark/mllib/clustering/BisectingKMeans.scala 6 additions, 2 deletions...a/org/apache/spark/mllib/clustering/BisectingKMeans.scala
- mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala 6 additions, 2 deletions...main/scala/org/apache/spark/mllib/clustering/KMeans.scala
- mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala 5 additions, 5 deletions.../main/scala/org/apache/spark/mllib/feature/Word2Vec.scala
Please register or sign in to comment