-
- Downloads
[SPARK-9615] [SPARK-9616] [SQL] [MLLIB] Bugs related to FrequentItems when...
[SPARK-9615] [SPARK-9616] [SQL] [MLLIB] Bugs related to FrequentItems when merging and with Tungsten In short: 1- FrequentItems should not use the InternalRow representation, because the keys in the map get messed up. For example, every key in the Map correspond to the very last element observed in the partition, when the elements are strings. 2- Merging two partitions had a bug: **Existing behavior with size 3** Partition A -> Map(1 -> 3, 2 -> 3, 3 -> 4) Partition B -> Map(4 -> 25) Result -> Map() **Correct Behavior:** Partition A -> Map(1 -> 3, 2 -> 3, 3 -> 4) Partition B -> Map(4 -> 25) Result -> Map(3 -> 1, 4 -> 22) cc mengxr rxin JoshRosen Author: Burak Yavuz <brkyvz@gmail.com> Closes #7945 from brkyvz/freq-fix and squashes the following commits: 07fa001 [Burak Yavuz] address 2 1dc61a8 [Burak Yavuz] address 1 506753e [Burak Yavuz] fixed and added reg test 47bfd50 [Burak Yavuz] pushing
Showing
- sql/core/src/main/scala/org/apache/spark/sql/execution/stat/FrequentItems.scala 15 additions, 11 deletions...a/org/apache/spark/sql/execution/stat/FrequentItems.scala
- sql/core/src/test/scala/org/apache/spark/sql/DataFrameStatSuite.scala 21 additions, 3 deletions.../test/scala/org/apache/spark/sql/DataFrameStatSuite.scala
Please register or sign in to comment