-
- Downloads
[SPARK-14931][ML][PYTHON] Mismatched default values between pipelines in Spark and PySpark - update
## What changes were proposed in this pull request? This PR is an update for [https://github.com/apache/spark/pull/12738] which: * Adds a generic unit test for JavaParams wrappers in pyspark.ml for checking default Param values vs. the defaults in the Scala side * Various fixes for bugs found * This includes changing classes taking weightCol to treat unset and empty String Param values the same way. Defaults changed: * Scala * LogisticRegression: weightCol defaults to not set (instead of empty string) * StringIndexer: labels default to not set (instead of empty array) * GeneralizedLinearRegression: * maxIter always defaults to 25 (simpler than defaulting to 25 for a particular solver) * weightCol defaults to not set (instead of empty string) * LinearRegression: weightCol defaults to not set (instead of empty string) * Python * MultilayerPerceptron: layers default to not set (instead of [1,1]) * ChiSqSelector: numTopFeatures defaults to 50 (instead of not set) ## How was this patch tested? Generic unit test. Manually tested that unit test by changing defaults and verifying that broke the test. Author: Joseph K. Bradley <joseph@databricks.com> Author: yinxusen <yinxusen@gmail.com> Closes #12816 from jkbradley/yinxusen-SPARK-14931.
Showing
- mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala 3 additions, 4 deletions...g/apache/spark/ml/classification/LogisticRegression.scala
- mllib/src/main/scala/org/apache/spark/ml/feature/StringIndexer.scala 2 additions, 3 deletions...ain/scala/org/apache/spark/ml/feature/StringIndexer.scala
- mllib/src/main/scala/org/apache/spark/ml/regression/GeneralizedLinearRegression.scala 18 additions, 13 deletions...che/spark/ml/regression/GeneralizedLinearRegression.scala
- mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala 9 additions, 6 deletions...ala/org/apache/spark/ml/regression/LinearRegression.scala
- mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala 1 addition, 1 deletion...che/spark/ml/classification/LogisticRegressionSuite.scala
- mllib/src/test/scala/org/apache/spark/ml/regression/GeneralizedLinearRegressionSuite.scala 1 addition, 1 deletion...park/ml/regression/GeneralizedLinearRegressionSuite.scala
- python/pyspark/ml/classification.py 5 additions, 8 deletionspython/pyspark/ml/classification.py
- python/pyspark/ml/feature.py 1 addition, 0 deletionspython/pyspark/ml/feature.py
- python/pyspark/ml/regression.py 6 additions, 3 deletionspython/pyspark/ml/regression.py
- python/pyspark/ml/tests.py 48 additions, 0 deletionspython/pyspark/ml/tests.py
- python/pyspark/ml/wrapper.py 2 additions, 1 deletionpython/pyspark/ml/wrapper.py
Loading
Please register or sign in to comment