-
- Downloads
[SPARK-4001][MLlib] adding parallel FP-Growth algorithm for frequent pattern mining in MLlib
Apriori is the classic algorithm for frequent item set mining in a transactional data set. It will be useful if Apriori algorithm is added to MLLib in Spark. This PR add an implementation for it. There is a point I am not sure wether it is most efficient. In order to filter out the eligible frequent item set, currently I am using a cartesian operation on two RDDs to calculate the degree of support of each item set, not sure wether it is better to use broadcast variable to achieve the same. I will add an example to use this algorithm if requires Author: Jacky Li <jacky.likun@huawei.com> Author: Jacky Li <jackylk@users.noreply.github.com> Author: Xiangrui Meng <meng@databricks.com> Closes #2847 from jackylk/apriori and squashes the following commits: bee3093 [Jacky Li] Merge pull request #1 from mengxr/SPARK-4001 7e69725 [Xiangrui Meng] simplify FPTree and update FPGrowth ec21f7d [Jacky Li] fix scalastyle 93f3280 [Jacky Li] create FPTree class d110ab2 [Jacky Li] change test case to use MLlibTestSparkContext a6c5081 [Jacky Li] Add Parallel FPGrowth algorithm eb3e4ca [Jacky Li] add FPGrowth 03df2b6 [Jacky Li] refactory according to comments 7b77ad7 [Jacky Li] fix scalastyle check f68a0bd [Jacky Li] add 2 apriori implemenation and fp-growth implementation 889b33f [Jacky Li] modify per scalastyle check da2cba7 [Jacky Li] adding apriori algorithm for frequent item set mining in Spark
Showing
- mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala 162 additions, 0 deletions.../src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala
- mllib/src/main/scala/org/apache/spark/mllib/fpm/FPTree.scala 134 additions, 0 deletionsmllib/src/main/scala/org/apache/spark/mllib/fpm/FPTree.scala
- mllib/src/test/scala/org/apache/spark/mllib/fpm/FPGrowthSuite.scala 73 additions, 0 deletions...test/scala/org/apache/spark/mllib/fpm/FPGrowthSuite.scala
- mllib/src/test/scala/org/apache/spark/mllib/fpm/FPTreeSuite.scala 115 additions, 0 deletions...c/test/scala/org/apache/spark/mllib/fpm/FPTreeSuite.scala
Loading
Please register or sign in to comment