-
- Downloads
Merge pull request #73 from falaki/ApproximateDistinctCount
Approximate distinct count Added countApproxDistinct() to RDD and countApproxDistinctByKey() to PairRDDFunctions to approximately count distinct number of elements and distinct number of values per key, respectively. Both functions use HyperLogLog from stream-lib for counting. Both functions take a parameter that controls the trade-off between accuracy and memory consumption. Also added Scala docs and test suites for both methods.
No related branches found
No related tags found
Showing
- core/pom.xml 1351 additions, 231 deletionscore/pom.xml
- core/src/main/scala/org/apache/spark/api/java/JavaPairRDD.scala 36 additions, 0 deletions...rc/main/scala/org/apache/spark/api/java/JavaPairRDD.scala
- core/src/main/scala/org/apache/spark/api/java/JavaRDDLike.scala 11 additions, 0 deletions...rc/main/scala/org/apache/spark/api/java/JavaRDDLike.scala
- core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala 42 additions, 0 deletions...rc/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
- core/src/main/scala/org/apache/spark/rdd/RDD.scala 15 additions, 1 deletioncore/src/main/scala/org/apache/spark/rdd/RDD.scala
- core/src/main/scala/org/apache/spark/util/SerializableHyperLogLog.scala 50 additions, 0 deletions...scala/org/apache/spark/util/SerializableHyperLogLog.scala
- core/src/test/scala/org/apache/spark/JavaAPISuite.java 32 additions, 0 deletionscore/src/test/scala/org/apache/spark/JavaAPISuite.java
- core/src/test/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala 34 additions, 0 deletions...st/scala/org/apache/spark/rdd/PairRDDFunctionsSuite.scala
- core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala 13 additions, 0 deletionscore/src/test/scala/org/apache/spark/rdd/RDDSuite.scala
- core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala 4 additions, 0 deletions...ala/org/apache/spark/serializer/KryoSerializerSuite.scala
- pom.xml 5 additions, 0 deletionspom.xml
- project/SparkBuild.scala 2 additions, 1 deletionproject/SparkBuild.scala
Loading
Please register or sign in to comment