Skip to content
  • Yanbo Liang's avatar
    90b59d1b
    [SPARK-18686][SPARKR][ML] Several cleanup and improvements for spark.logit. · 90b59d1b
    Yanbo Liang authored
    ## What changes were proposed in this pull request?
    Several cleanup and improvements for ```spark.logit```:
    * ```summary``` should return coefficients matrix, and should output labels for each class if the model is multinomial logistic regression model.
    * ```summary``` should not return ```areaUnderROC, roc, pr, ...```, since most of them are DataFrame which are less important for R users. Meanwhile, these metrics ignore instance weights (setting all to 1.0) which will be changed in later Spark version. In case it will introduce breaking changes, we do not expose them currently.
    * SparkR test improvement: comparing the training result with native R glmnet.
    * Remove argument ```aggregationDepth``` from ```spark.logit```, since it's an expert Param(related with Spark architecture and job execution) that would be used rarely by R users.
    
    ## How was this patch tested?
    Unit tests.
    
    The ```summary``` output after this change:
    multinomial logistic regression:
    ```
    > df <- suppressWarnings(createDataFrame(iris))
    > model <- spark.logit(df, Species ~ ., regParam = 0.5)
    > summary(model)
    $coefficients
                 versicolor  virginica   setosa
    (Intercept)  1.514031    -2.609108   1.095077
    Sepal_Length 0.02511006  0.2649821   -0.2900921
    Sepal_Width  -0.5291215  -0.02016446 0.549286
    Petal_Length 0.03647411  0.1544119   -0.190886
    Petal_Width  0.000236092 0.4195804   -0.4198165
    ```
    binomial logistic regression:
    ```
    > df <- suppressWarnings(createDataFrame(iris))
    > training <- df[df$Species %in% c("versicolor", "virginica"), ]
    > model <- spark.logit(training, Species ~ ., regParam = 0.5)
    > summary(model)
    $coefficients
                 Estimate
    (Intercept)  -6.053815
    Sepal_Length 0.2449379
    Sepal_Width  0.1648321
    Petal_Length 0.4730718
    Petal_Width  1.031947
    ```
    
    Author: Yanbo Liang <ybliang8@gmail.com>
    
    Closes #16117 from yanboliang/spark-18686.
    90b59d1b
    [SPARK-18686][SPARKR][ML] Several cleanup and improvements for spark.logit.
    Yanbo Liang authored
    ## What changes were proposed in this pull request?
    Several cleanup and improvements for ```spark.logit```:
    * ```summary``` should return coefficients matrix, and should output labels for each class if the model is multinomial logistic regression model.
    * ```summary``` should not return ```areaUnderROC, roc, pr, ...```, since most of them are DataFrame which are less important for R users. Meanwhile, these metrics ignore instance weights (setting all to 1.0) which will be changed in later Spark version. In case it will introduce breaking changes, we do not expose them currently.
    * SparkR test improvement: comparing the training result with native R glmnet.
    * Remove argument ```aggregationDepth``` from ```spark.logit```, since it's an expert Param(related with Spark architecture and job execution) that would be used rarely by R users.
    
    ## How was this patch tested?
    Unit tests.
    
    The ```summary``` output after this change:
    multinomial logistic regression:
    ```
    > df <- suppressWarnings(createDataFrame(iris))
    > model <- spark.logit(df, Species ~ ., regParam = 0.5)
    > summary(model)
    $coefficients
                 versicolor  virginica   setosa
    (Intercept)  1.514031    -2.609108   1.095077
    Sepal_Length 0.02511006  0.2649821   -0.2900921
    Sepal_Width  -0.5291215  -0.02016446 0.549286
    Petal_Length 0.03647411  0.1544119   -0.190886
    Petal_Width  0.000236092 0.4195804   -0.4198165
    ```
    binomial logistic regression:
    ```
    > df <- suppressWarnings(createDataFrame(iris))
    > training <- df[df$Species %in% c("versicolor", "virginica"), ]
    > model <- spark.logit(training, Species ~ ., regParam = 0.5)
    > summary(model)
    $coefficients
                 Estimate
    (Intercept)  -6.053815
    Sepal_Length 0.2449379
    Sepal_Width  0.1648321
    Petal_Length 0.4730718
    Petal_Width  1.031947
    ```
    
    Author: Yanbo Liang <ybliang8@gmail.com>
    
    Closes #16117 from yanboliang/spark-18686.
Loading