-
- Downloads
[SPARK-9741] [SQL] Approximate Count Distinct using the new UDAF interface.
This PR implements a HyperLogLog based Approximate Count Distinct function using the new UDAF interface. The implementation is inspired by the ClearSpring HyperLogLog implementation and should produce the same results. There is still some documentation and testing left to do. cc yhuai Author: Herman van Hovell <hvanhovell@questtec.nl> Closes #8362 from hvanhovell/SPARK-9741.
Showing
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/functions.scala 399 additions, 0 deletions.../spark/sql/catalyst/expressions/aggregate/functions.scala
- sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/utils.scala 6 additions, 0 deletions...ache/spark/sql/catalyst/expressions/aggregate/utils.scala
- sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/aggregate/HyperLogLogPlusPlusSuite.scala 149 additions, 0 deletions...lyst/expressions/aggregate/HyperLogLogPlusPlusSuite.scala
Please register or sign in to comment