-
- Downloads
[SPARK-17017][MLLIB][ML] add a chiSquare Selector based on False Positive Rate (FPR) test
## What changes were proposed in this pull request? Univariate feature selection works by selecting the best features based on univariate statistical tests. False Positive Rate (FPR) is a popular univariate statistical test for feature selection. We add a chiSquare Selector based on False Positive Rate (FPR) test in this PR, like it is implemented in scikit-learn. http://scikit-learn.org/stable/modules/feature_selection.html#univariate-feature-selection ## How was this patch tested? Add Scala ut Author: Peng, Meng <peng.meng@intel.com> Closes #14597 from mpjlu/fprChiSquare.
Showing
- mllib/src/main/scala/org/apache/spark/ml/feature/ChiSqSelector.scala 66 additions, 3 deletions...ain/scala/org/apache/spark/ml/feature/ChiSqSelector.scala
- mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala 25 additions, 3 deletions...la/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala
- mllib/src/main/scala/org/apache/spark/mllib/feature/ChiSqSelector.scala 75 additions, 28 deletions.../scala/org/apache/spark/mllib/feature/ChiSqSelector.scala
- mllib/src/test/scala/org/apache/spark/ml/feature/ChiSqSelectorSuite.scala 9 additions, 2 deletions...cala/org/apache/spark/ml/feature/ChiSqSelectorSuite.scala
- mllib/src/test/scala/org/apache/spark/mllib/feature/ChiSqSelectorSuite.scala 18 additions, 0 deletions...a/org/apache/spark/mllib/feature/ChiSqSelectorSuite.scala
- project/MimaExcludes.scala 3 additions, 0 deletionsproject/MimaExcludes.scala
- python/pyspark/mllib/feature.py 66 additions, 5 deletionspython/pyspark/mllib/feature.py
Loading
Please register or sign in to comment