-
- Downloads
[SPARK-8703] [ML] Add CountVectorizer as a ml transformer to convert document to words count vector
jira: https://issues.apache.org/jira/browse/SPARK-8703 Converts a text document to a sparse vector of token counts. I can further add an estimator to extract vocabulary from corpus if that's appropriate. Author: Yuhao Yang <hhbyyh@gmail.com> Closes #7084 from hhbyyh/countVectorization and squashes the following commits: 5f3f655 [Yuhao Yang] text change 24728e4 [Yuhao Yang] style improvement 576728a [Yuhao Yang] rename to model and some fix 1deca28 [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into countVectorization 99b0c14 [Yuhao Yang] undo extension from HashingTF 12c2dc8 [Yuhao Yang] Merge remote-tracking branch 'upstream/master' into countVectorization 7ee1c31 [Yuhao Yang] extends HashingTF 809fb59 [Yuhao Yang] minor fix for ut 7c61fb3 [Yuhao Yang] add countVectorizer
Showing
- mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizerModel.scala 82 additions, 0 deletions...la/org/apache/spark/ml/feature/CountVectorizerModel.scala
- mllib/src/test/scala/org/apache/spark/ml/feature/CountVectorizorSuite.scala 73 additions, 0 deletions...la/org/apache/spark/ml/feature/CountVectorizorSuite.scala
Loading
Please register or sign in to comment