-
- Downloads
[SPARK-7242][SQL][MLLIB] Frequent items for DataFrames
Finding frequent items with possibly false positives, using the algorithm described in `http://www.cs.umd.edu/~samir/498/karp.pdf`. public API under: ``` df.stat.freqItems(cols: Array[String], support: Double = 0.001): DataFrame ``` The output is a local DataFrame having the input column names with `-freqItems` appended to it. This is a single pass algorithm that may return false positives, but no false negatives. cc mengxr rxin Let's get the implementations in, I can add python API in a follow up PR. Author: Burak Yavuz <brkyvz@gmail.com> Closes #5799 from brkyvz/freq-items and squashes the following commits: a6ec82c [Burak Yavuz] addressed comments v? 39b1bba [Burak Yavuz] removed toSeq 0915e23 [Burak Yavuz] addressed comments v2.1 3a5c177 [Burak Yavuz] addressed comments v2.0 482e741 [Burak Yavuz] removed old import 38e784d [Burak Yavuz] addressed comments v1.0 8279d4d [Burak Yavuz] added default value for support 3d82168 [Burak Yavuz] made base implementation
Showing
- sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala 11 additions, 0 deletionssql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala
- sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala 68 additions, 0 deletions...n/scala/org/apache/spark/sql/DataFrameStatFunctions.scala
- sql/core/src/main/scala/org/apache/spark/sql/execution/stat/FrequentItems.scala 121 additions, 0 deletions...a/org/apache/spark/sql/execution/stat/FrequentItems.scala
- sql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameSuite.java 9 additions, 5 deletions...st/java/test/org/apache/spark/sql/JavaDataFrameSuite.java
- sql/core/src/test/scala/org/apache/spark/sql/DataFrameStatSuite.scala 47 additions, 0 deletions.../test/scala/org/apache/spark/sql/DataFrameStatSuite.scala
Please register or sign in to comment