-
- Downloads
[SPARK-15509][FOLLOW-UP][ML][SPARKR] R MLlib algorithms should support input...
[SPARK-15509][FOLLOW-UP][ML][SPARKR] R MLlib algorithms should support input columns "features" and "label" ## What changes were proposed in this pull request? #13584 resolved the issue of features and label columns conflict with ```RFormula``` default ones when loading libsvm data, but it still left some issues should be resolved: 1, It’s not necessary to check and rename label column. Since we have considerations on the design of ```RFormula```, it can handle the case of label column already exists(with restriction of the existing label column should be numeric/boolean type). So it’s not necessary to change the column name to avoid conflict. If the label column is not numeric/boolean type, ```RFormula``` will throw exception. 2, We should rename features column name to new one if there is conflict, but appending a random value is enough since it was used internally only. We done similar work when implementing ```SQLTransformer```. 3, We should set correct new features column for the estimators. Take ```GLM``` as example: ```GLM``` estimator should set features column with the changed one(rFormula.getFeaturesCol) rather than the default “features”. Although it’s same when training model, but it involves problems when predicting. The following is the prediction result of GLM before this PR:  We should drop the internal used feature column name, otherwise, it will appear on the prediction DataFrame which will confused users. And this behavior is same as other scenarios which does not exist column name conflict. After this PR:  ## How was this patch tested? Existing unit tests. Author: Yanbo Liang <ybliang8@gmail.com> Closes #14993 from yanboliang/spark-15509.
Showing
- mllib/src/main/scala/org/apache/spark/ml/r/AFTSurvivalRegressionWrapper.scala 1 addition, 0 deletions.../org/apache/spark/ml/r/AFTSurvivalRegressionWrapper.scala
- mllib/src/main/scala/org/apache/spark/ml/r/GaussianMixtureWrapper.scala 1 addition, 0 deletions.../scala/org/apache/spark/ml/r/GaussianMixtureWrapper.scala
- mllib/src/main/scala/org/apache/spark/ml/r/GeneralizedLinearRegressionWrapper.scala 1 addition, 0 deletions...pache/spark/ml/r/GeneralizedLinearRegressionWrapper.scala
- mllib/src/main/scala/org/apache/spark/ml/r/IsotonicRegressionWrapper.scala 1 addition, 0 deletions...ala/org/apache/spark/ml/r/IsotonicRegressionWrapper.scala
- mllib/src/main/scala/org/apache/spark/ml/r/KMeansWrapper.scala 1 addition, 0 deletions.../src/main/scala/org/apache/spark/ml/r/KMeansWrapper.scala
- mllib/src/main/scala/org/apache/spark/ml/r/NaiveBayesWrapper.scala 1 addition, 0 deletions.../main/scala/org/apache/spark/ml/r/NaiveBayesWrapper.scala
- mllib/src/main/scala/org/apache/spark/ml/r/RWrapperUtils.scala 4 additions, 30 deletions.../src/main/scala/org/apache/spark/ml/r/RWrapperUtils.scala
- mllib/src/test/scala/org/apache/spark/ml/r/RWrapperUtilsSuite.scala 4 additions, 12 deletions...test/scala/org/apache/spark/ml/r/RWrapperUtilsSuite.scala
Please register or sign in to comment