-
- Downloads
[SPARK-12933][SQL] Initial implementation of Count-Min sketch
This PR adds an initial implementation of count min sketch, contained in a new module spark-sketch under `common/sketch`. The implementation is based on the [`CountMinSketch` class in stream-lib][1]. As required by the [design doc][2], spark-sketch should have no external dependency. Two classes, `Murmur3_x86_32` and `Platform` are copied to spark-sketch from spark-unsafe for hashing facilities. They'll also be used in the upcoming bloom filter implementation. The following features will be added in future follow-up PRs: - Serialization support - DataFrame API integration [1]: https://github.com/addthis/stream-lib/blob/aac6b4d23a8686b000f80baa447e0922ecac3bcb/src/main/java/com/clearspring/analytics/stream/frequency/CountMinSketch.java [2]: https://issues.apache.org/jira/secure/attachment/12782378/BloomFilterandCount-MinSketchinSpark2.0.pdf Author: Cheng Lian <lian@databricks.com> Closes #10851 from liancheng/count-min-sketch.
Showing
- common/sketch/pom.xml 42 additions, 0 deletionscommon/sketch/pom.xml
- common/sketch/src/main/java/org/apache/spark/util/sketch/CountMinSketch.java 132 additions, 0 deletions...ain/java/org/apache/spark/util/sketch/CountMinSketch.java
- common/sketch/src/main/java/org/apache/spark/util/sketch/CountMinSketchImpl.java 268 additions, 0 deletions...java/org/apache/spark/util/sketch/CountMinSketchImpl.java
- common/sketch/src/main/java/org/apache/spark/util/sketch/Murmur3_x86_32.java 126 additions, 0 deletions...ain/java/org/apache/spark/util/sketch/Murmur3_x86_32.java
- common/sketch/src/main/java/org/apache/spark/util/sketch/Platform.java 172 additions, 0 deletions.../src/main/java/org/apache/spark/util/sketch/Platform.java
- common/sketch/src/test/scala/org/apache/spark/util/sketch/CountMinSketchSuite.scala 112 additions, 0 deletions...la/org/apache/spark/util/sketch/CountMinSketchSuite.scala
- dev/sparktestsupport/modules.py 12 additions, 0 deletionsdev/sparktestsupport/modules.py
- pom.xml 1 addition, 0 deletionspom.xml
- project/SparkBuild.scala 27 additions, 12 deletionsproject/SparkBuild.scala
Loading
Please register or sign in to comment