-
- Downloads
[SPARK-5900][MLLIB] make PIC and FPGrowth Java-friendly
In the previous version, PIC stores clustering assignments as an `RDD[(Long, Int)]`. This is mapped to `RDD<Tuple2<Object, Object>>` in Java and hence Java users have to cast types manually. We should either create a new method called `javaAssignments` that returns `JavaRDD[(java.lang.Long, java.lang.Int)]` or wrap the result pair in a class. I chose the latter approach in this PR. Now assignments are stored as an `RDD[Assignment]`, where `Assignment` is a class with `id` and `cluster`. Similarly, in FPGrowth, the frequent itemsets are stored as an `RDD[(Array[Item], Long)]`, which is mapped to `RDD<Tuple2<Object, Object>>`. Though we provide a "Java-friendly" method `javaFreqItemsets` that returns `JavaRDD[(Array[Item], java.lang.Long)]`. It doesn't really work because `Array[Item]` is mapped to `Object` in Java. So in this PR I created a class `FreqItemset` to wrap the results. It has `items` and `freq`, as well as a `javaItems` method that returns `List<Item>` in Java. I'm not certain that the names I chose are proper: `Assignment`/`id`/`cluster` and `FreqItemset`/`items`/`freq`. Please let me know if there are better suggestions. CC: jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #4695 from mengxr/SPARK-5900 and squashes the following commits: 865b5ca [Xiangrui Meng] make Assignment serializable cffa96e [Xiangrui Meng] fix test 9c0e590 [Xiangrui Meng] remove unused Tuple2 1b9db3d [Xiangrui Meng] make PIC and FPGrowth Java-friendly
Showing
- docs/mllib-clustering.md 4 additions, 4 deletionsdocs/mllib-clustering.md
- docs/mllib-frequent-pattern-mining.md 5 additions, 7 deletionsdocs/mllib-frequent-pattern-mining.md
- examples/src/main/java/org/apache/spark/examples/mllib/JavaFPGrowthExample.java 3 additions, 5 deletions.../org/apache/spark/examples/mllib/JavaFPGrowthExample.java
- examples/src/main/java/org/apache/spark/examples/mllib/JavaPowerIterationClusteringExample.java 2 additions, 3 deletions...k/examples/mllib/JavaPowerIterationClusteringExample.java
- examples/src/main/scala/org/apache/spark/examples/mllib/FPGrowthExample.scala 2 additions, 2 deletions...ala/org/apache/spark/examples/mllib/FPGrowthExample.scala
- examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala 2 additions, 6 deletions...park/examples/mllib/PowerIterationClusteringExample.scala
- mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala 27 additions, 6 deletions...che/spark/mllib/clustering/PowerIterationClustering.scala
- mllib/src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala 29 additions, 12 deletions.../src/main/scala/org/apache/spark/mllib/fpm/FPGrowth.scala
- mllib/src/test/java/org/apache/spark/mllib/fpm/JavaFPGrowthSuite.java 10 additions, 20 deletions...st/java/org/apache/spark/mllib/fpm/JavaFPGrowthSuite.java
- mllib/src/test/scala/org/apache/spark/mllib/clustering/PowerIterationClusteringSuite.scala 4 additions, 4 deletions...park/mllib/clustering/PowerIterationClusteringSuite.scala
- mllib/src/test/scala/org/apache/spark/mllib/fpm/FPGrowthSuite.scala 5 additions, 5 deletions...test/scala/org/apache/spark/mllib/fpm/FPGrowthSuite.scala
Loading
Please register or sign in to comment