-
- Downloads
[SPARK-20265][MLLIB] Improve Prefix'span pre-processing efficiency
## What changes were proposed in this pull request? Improve PrefixSpan pre-processing efficency by preventing sequences of zero in the cleaned database. The efficiency gain is reflected in the following graph : https://postimg.org/image/9x6ireuvn/ ## How was this patch tested? Using MLlib's PrefixSpan existing tests and tests of my own on the 8 datasets shown in the graph. All result obtained were stricly the same as the original implementation (without this change). dev/run-tests was also runned, no error were found. Author : Cyril de Vogelaere <cyril.devogelaeregmail.com> Author: Syrux <pokcyril@hotmail.com> Closes #17575 from Syrux/SPARK-20265.
Showing
- mllib/src/main/scala/org/apache/spark/mllib/fpm/PrefixSpan.scala 64 additions, 35 deletions...rc/main/scala/org/apache/spark/mllib/fpm/PrefixSpan.scala
- mllib/src/test/scala/org/apache/spark/mllib/fpm/PrefixSpanSuite.scala 51 additions, 0 deletions...st/scala/org/apache/spark/mllib/fpm/PrefixSpanSuite.scala
Please register or sign in to comment