Commit 354f4582 authored 9 years ago by Yuhao Yang Committed by Joseph K. Bradley 9 years ago

[SPARK-9028] [ML] Add CountVectorizer as an estimator to generate CountVectorizerModel

jira: https://issues.apache.org/jira/browse/SPARK-9028

Add an estimator for CountVectorizerModel. The estimator will extract a vocabulary from document collections according to the term frequency.

I changed the meaning of minCount as a filter across the corpus. This aligns with Word2Vec and the similar parameter in SKlearn.

Author: Yuhao Yang <hhbyyh@gmail.com>
Author: Joseph K. Bradley <joseph@databricks.com>

Closes #7388 from hhbyyh/cvEstimator.

parent 1968276a

No related branches found

No related tags found

No related merge requests found

Showing with 402 additions and 82 deletions

Please register or to comment