-
- Downloads
[SPARK-9028] [ML] Add CountVectorizer as an estimator to generate CountVectorizerModel
jira: https://issues.apache.org/jira/browse/SPARK-9028 Add an estimator for CountVectorizerModel. The estimator will extract a vocabulary from document collections according to the term frequency. I changed the meaning of minCount as a filter across the corpus. This aligns with Word2Vec and the similar parameter in SKlearn. Author: Yuhao Yang <hhbyyh@gmail.com> Author: Joseph K. Bradley <joseph@databricks.com> Closes #7388 from hhbyyh/cvEstimator.
Showing
- mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala 235 additions, 0 deletions...n/scala/org/apache/spark/ml/feature/CountVectorizer.scala
- mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizerModel.scala 0 additions, 82 deletions...la/org/apache/spark/ml/feature/CountVectorizerModel.scala
- mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizerSuite.scala 167 additions, 0 deletions...la/org/apache/spark/ml/feature/CountVectorizerSuite.scala
Please register or sign in to comment