-
- Downloads
[SPARK-10574][ML][MLLIB] HashingTF supports MurmurHash3
## What changes were proposed in this pull request? As the discussion at [SPARK-10574](https://issues.apache.org/jira/browse/SPARK-10574), ```HashingTF``` should support MurmurHash3 and make it as the default hash algorithm. We should also expose set/get API for ```hashAlgorithm```, then users can choose the hash method. Note: The problem that ```mllib.feature.HashingTF``` behaves differently between Scala/Java and Python will be resolved in the followup work. ## How was this patch tested? unit tests. cc jkbradley MLnick Author: Yanbo Liang <ybliang8@gmail.com> Author: Joseph K. Bradley <joseph@databricks.com> Closes #12498 from yanboliang/spark-10574.
Showing
- mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala 30 additions, 3 deletions...rc/main/scala/org/apache/spark/ml/feature/HashingTF.scala
- mllib/src/main/scala/org/apache/spark/mllib/feature/HashingTF.scala 71 additions, 2 deletions...main/scala/org/apache/spark/mllib/feature/HashingTF.scala
- mllib/src/test/scala/org/apache/spark/ml/feature/HashingTFSuite.scala 28 additions, 16 deletions...st/scala/org/apache/spark/ml/feature/HashingTFSuite.scala
- python/pyspark/ml/feature.py 31 additions, 8 deletionspython/pyspark/ml/feature.py
- python/pyspark/ml/tests.py 2 additions, 1 deletionpython/pyspark/ml/tests.py
Loading
Please register or sign in to comment