-
- Downloads
[SPARK-7219] [MLLIB] Output feature attributes in HashingTF
This PR updates `HashingTF` to output ML attributes that tell the number of features in the output column. We need to expand `UnaryTransformer` to support output metadata. A `df outputMetadata: Metadata` is not sufficient because the metadata may also depends on the input data. Though this is not true for `HashingTF`, I think it is reasonable to update `UnaryTransformer` in a separate PR. `checkParams` is added to verify common requirements for params. I will send a separate PR to use it in other test suites. jkbradley Author: Xiangrui Meng <meng@databricks.com> Closes #6308 from mengxr/SPARK-7219 and squashes the following commits: 9bd2922 [Xiangrui Meng] address comments e82a68a [Xiangrui Meng] remove sqlContext from test suite 995535b [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-7219 2194703 [Xiangrui Meng] add test for attributes 178ae23 [Xiangrui Meng] update HashingTF with tests 91a6106 [Xiangrui Meng] WIP
Showing
- mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala 26 additions, 8 deletions...rc/main/scala/org/apache/spark/ml/feature/HashingTF.scala
- mllib/src/test/scala/org/apache/spark/ml/feature/HashingTFSuite.scala 55 additions, 0 deletions...st/scala/org/apache/spark/ml/feature/HashingTFSuite.scala
- mllib/src/test/scala/org/apache/spark/ml/param/ParamsSuite.scala 20 additions, 0 deletions...rc/test/scala/org/apache/spark/ml/param/ParamsSuite.scala
Please register or sign in to comment