Commit 806c579f authored 9 years ago by Yuhao Yang Committed by Joseph K. Bradley 9 years ago

[SPARK-9062] [ML] Change output type of Tokenizer to Array(String, true)

jira: https://issues.apache.org/jira/browse/SPARK-9062

Currently output type of Tokenizer is Array(String, false), which is not compatible with Word2Vec and Other transformers since their input type is Array(String, true). Seq[String] in udf will be treated as Array(String, true) by default.

I'm not sure what's the recommended way for Tokenizer to handle the null value in the input. Any suggestion will be welcome.

Author: Yuhao Yang <hhbyyh@gmail.com>

Closes #7414 from hhbyyh/tokenizer and squashes the following commits:

c01bd7a [Yuhao Yang] change output type of tokenizer

parent f9a82a88

No related branches found

No related tags found

No related merge requests found

Hide whitespace changes

Inline Side-by-side

Showing with 2 additions and 2 deletions

Please register or to comment