-
- Downloads
[SPARK-12938][SQL] DataFrame API for Bloom filter
This PR integrates Bloom filter from spark-sketch into DataFrame. This version resorts to RDD.aggregate for building the filter. A more performant UDAF version can be built in future follow-up PRs. This PR also add 2 specify `put` version(`putBinary` and `putLong`) into `BloomFilter`, which makes it easier to build a Bloom filter over a `DataFrame`. Author: Wenchen Fan <wenchen@databricks.com> Closes #10937 from cloud-fan/bloom-filter.
Showing
- common/sketch/src/main/java/org/apache/spark/util/sketch/BloomFilter.java 33 additions, 1 deletion...c/main/java/org/apache/spark/util/sketch/BloomFilter.java
- common/sketch/src/main/java/org/apache/spark/util/sketch/BloomFilterImpl.java 94 additions, 47 deletions...in/java/org/apache/spark/util/sketch/BloomFilterImpl.java
- common/sketch/src/main/java/org/apache/spark/util/sketch/CountMinSketchImpl.java 4 additions, 43 deletions...java/org/apache/spark/util/sketch/CountMinSketchImpl.java
- common/sketch/src/main/java/org/apache/spark/util/sketch/Utils.java 48 additions, 0 deletions...tch/src/main/java/org/apache/spark/util/sketch/Utils.java
- sql/core/src/main/scala/org/apache/spark/sql/DataFrameStatFunctions.scala 74 additions, 2 deletions...n/scala/org/apache/spark/sql/DataFrameStatFunctions.scala
- sql/core/src/test/java/test/org/apache/spark/sql/JavaDataFrameSuite.java 31 additions, 0 deletions...st/java/test/org/apache/spark/sql/JavaDataFrameSuite.java
- sql/core/src/test/scala/org/apache/spark/sql/DataFrameStatSuite.scala 22 additions, 0 deletions.../test/scala/org/apache/spark/sql/DataFrameStatSuite.scala
Loading
Please register or sign in to comment