-
- Downloads
[SPARK-18412][SPARKR][ML] Fix exception for some SparkR ML algorithms training on libsvm data
## What changes were proposed in this pull request? * Fix the following exceptions which throws when ```spark.randomForest```(classification), ```spark.gbt```(classification), ```spark.naiveBayes``` and ```spark.glm```(binomial family) were fitted on libsvm data. ``` java.lang.IllegalArgumentException: requirement failed: If label column already exists, forceIndexLabel can not be set with true. ``` See [SPARK-18412](https://issues.apache.org/jira/browse/SPARK-18412) for more detail about how to reproduce this bug. * Refactor out ```getFeaturesAndLabels``` to RWrapperUtils, since lots of ML algorithm wrappers use this function. * Drop some unwanted columns when making prediction. ## How was this patch tested? Add unit test. Author: Yanbo Liang <ybliang8@gmail.com> Closes #15851 from yanboliang/spark-18412.
Showing
- R/pkg/inst/tests/testthat/test_mllib.R 15 additions, 3 deletionsR/pkg/inst/tests/testthat/test_mllib.R
- mllib/src/main/scala/org/apache/spark/ml/r/GBTClassificationWrapper.scala 6 additions, 12 deletions...cala/org/apache/spark/ml/r/GBTClassificationWrapper.scala
- mllib/src/main/scala/org/apache/spark/ml/r/GeneralizedLinearRegressionWrapper.scala 4 additions, 1 deletion...pache/spark/ml/r/GeneralizedLinearRegressionWrapper.scala
- mllib/src/main/scala/org/apache/spark/ml/r/NaiveBayesWrapper.scala 5 additions, 9 deletions.../main/scala/org/apache/spark/ml/r/NaiveBayesWrapper.scala
- mllib/src/main/scala/org/apache/spark/ml/r/RWrapperUtils.scala 32 additions, 4 deletions.../src/main/scala/org/apache/spark/ml/r/RWrapperUtils.scala
- mllib/src/main/scala/org/apache/spark/ml/r/RandomForestClassificationWrapper.scala 6 additions, 12 deletions...apache/spark/ml/r/RandomForestClassificationWrapper.scala
Please register or sign in to comment