-
- Downloads
[SPARK-7316][MLLIB] RDD sliding window with step
Implementation of step capability for sliding window function in MLlib's RDD. Though one can use current sliding window with step 1 and then filter every Nth window, it will take more time and space (N*data.count times more than needed). For example, below are the results for various windows and steps on 10M data points: Window | Step | Time | Windows produced ------------ | ------------- | ---------- | ---------- 128 | 1 | 6.38 | 9999873 128 | 10 | 0.9 | 999988 128 | 100 | 0.41 | 99999 1024 | 1 | 44.67 | 9998977 1024 | 10 | 4.74 | 999898 1024 | 100 | 0.78 | 99990 ``` import org.apache.spark.mllib.rdd.RDDFunctions._ val rdd = sc.parallelize(1 to 10000000, 10) rdd.count val window = 1024 val step = 1 val t = System.nanoTime(); val windows = rdd.sliding(window, step); println(windows.count); println((System.nanoTime() - t) / 1e9) ``` Author: unknown <ulanov@ULANOV3.americas.hpqcorp.net> Author: Alexander Ulanov <nashb@yandex.ru> Author: Xiangrui Meng <meng@databricks.com> Closes #5855 from avulanov/SPARK-7316-sliding.
Showing
- mllib/src/main/scala/org/apache/spark/mllib/rdd/RDDFunctions.scala 8 additions, 3 deletions.../main/scala/org/apache/spark/mllib/rdd/RDDFunctions.scala
- mllib/src/main/scala/org/apache/spark/mllib/rdd/SlidingRDD.scala 39 additions, 32 deletions...rc/main/scala/org/apache/spark/mllib/rdd/SlidingRDD.scala
- mllib/src/test/scala/org/apache/spark/mllib/rdd/RDDFunctionsSuite.scala 7 additions, 4 deletions.../scala/org/apache/spark/mllib/rdd/RDDFunctionsSuite.scala
Loading
Please register or sign in to comment