Skip to content
Snippets Groups Projects
  1. Mar 21, 2017
    • Felix Cheung's avatar
      [SPARK-19237][SPARKR][CORE] On Windows spark-submit should handle when java is not installed · a8877bdb
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      When SparkR is installed as a R package there might not be any java runtime.
      If it is not there SparkR's `sparkR.session()` will block waiting for the connection timeout, hanging the R IDE/shell, without any notification or message.
      
      ## How was this patch tested?
      
      manually
      
      - [x] need to test on Windows
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16596 from felixcheung/rcheckjava.
      a8877bdb
    • Zheng RuiFeng's avatar
      [SPARK-20041][DOC] Update docs for NaN handling in approxQuantile · 63f077fb
      Zheng RuiFeng authored
      ## What changes were proposed in this pull request?
      Update docs for NaN handling in approxQuantile.
      
      ## How was this patch tested?
      existing tests.
      
      Author: Zheng RuiFeng <ruifengz@foxmail.com>
      
      Closes #17369 from zhengruifeng/doc_quantiles_nan.
      63f077fb
  2. Mar 20, 2017
    • Wenchen Fan's avatar
      [SPARK-19949][SQL] unify bad record handling in CSV and JSON · 68d65fae
      Wenchen Fan authored
      ## What changes were proposed in this pull request?
      
      Currently JSON and CSV have exactly the same logic about handling bad records, this PR tries to abstract it and put it in a upper level to reduce code duplication.
      
      The overall idea is, we make the JSON and CSV parser to throw a BadRecordException, then the upper level, FailureSafeParser, handles bad records according to the parse mode.
      
      Behavior changes:
      1. with PERMISSIVE mode, if the number of tokens doesn't match the schema, previously CSV parser will treat it as a legal record and parse as many tokens as possible. After this PR, we treat it as an illegal record, and put the raw record string in a special column, but we still parse as many tokens as possible.
      2. all logging is removed as they are not very useful in practice.
      
      ## How was this patch tested?
      
      existing tests
      
      Author: Wenchen Fan <wenchen@databricks.com>
      Author: hyukjinkwon <gurwls223@gmail.com>
      Author: Wenchen Fan <cloud0fan@gmail.com>
      
      Closes #17315 from cloud-fan/bad-record2.
      68d65fae
    • Felix Cheung's avatar
      [SPARK-20020][SPARKR][FOLLOWUP] DataFrame checkpoint API fix version tag · f14f81e9
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      doc only change
      
      ## How was this patch tested?
      
      manual
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #17356 from felixcheung/rdfcheckpoint2.
      f14f81e9
    • Felix Cheung's avatar
      [SPARK-20020][SPARKR] DataFrame checkpoint API · c4059772
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      Add checkpoint, setCheckpointDir API to R
      
      ## How was this patch tested?
      
      unit tests, manual tests
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #17351 from felixcheung/rdfcheckpoint.
      c4059772
    • hyukjinkwon's avatar
      [SPARK-19849][SQL] Support ArrayType in to_json to produce JSON array · 0cdcf911
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      This PR proposes to support an array of struct type in `to_json` as below:
      
      ```scala
      import org.apache.spark.sql.functions._
      
      val df = Seq(Tuple1(Tuple1(1) :: Nil)).toDF("a")
      df.select(to_json($"a").as("json")).show()
      ```
      
      ```
      +----------+
      |      json|
      +----------+
      |[{"_1":1}]|
      +----------+
      ```
      
      Currently, it throws an exception as below (a newline manually inserted for readability):
      
      ```
      org.apache.spark.sql.AnalysisException: cannot resolve 'structtojson(`array`)' due to data type
      mismatch: structtojson requires that the expression is a struct expression.;;
      ```
      
      This allows the roundtrip with `from_json` as below:
      
      ```scala
      import org.apache.spark.sql.functions._
      import org.apache.spark.sql.types._
      
      val schema = ArrayType(StructType(StructField("a", IntegerType) :: Nil))
      val df = Seq("""[{"a":1}, {"a":2}]""").toDF("json").select(from_json($"json", schema).as("array"))
      df.show()
      
      // Read back.
      df.select(to_json($"array").as("json")).show()
      ```
      
      ```
      +----------+
      |     array|
      +----------+
      |[[1], [2]]|
      +----------+
      
      +-----------------+
      |             json|
      +-----------------+
      |[{"a":1},{"a":2}]|
      +-----------------+
      ```
      
      Also, this PR proposes to rename from `StructToJson` to `StructsToJson ` and `JsonToStruct` to `JsonToStructs`.
      
      ## How was this patch tested?
      
      Unit tests in `JsonFunctionsSuite` and `JsonExpressionsSuite` for Scala, doctest for Python and test in `test_sparkSQL.R` for R.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #17192 from HyukjinKwon/SPARK-19849.
      0cdcf911
  3. Mar 19, 2017
    • Felix Cheung's avatar
      [SPARK-18817][SPARKR][SQL] change derby log output to temp dir · 422aa67d
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      Passes R `tempdir()` (this is the R session temp dir, shared with other temp files/dirs) to JVM, set System.Property for derby home dir to move derby.log
      
      ## How was this patch tested?
      
      Manually, unit tests
      
      With this, these are relocated to under /tmp
      ```
      # ls /tmp/RtmpG2M0cB/
      derby.log
      ```
      And they are removed automatically when the R session is ended.
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16330 from felixcheung/rderby.
      422aa67d
    • hyukjinkwon's avatar
      [MINOR][R] Reorder `Collate` fields in DESCRIPTION file · 60262bc9
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      It seems cran check scripts corrects `R/pkg/DESCRIPTION` and follows the order in `Collate` fields.
      
      This PR proposes to fix this so that running this script does not show up a diff in this file.
      
      ## How was this patch tested?
      
      Manually via `./R/check-cran.sh`.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #17349 from HyukjinKwon/minor-cran.
      60262bc9
  4. Mar 18, 2017
  5. Mar 14, 2017
    • hyukjinkwon's avatar
      [SPARK-19828][R] Support array type in from_json in R · d1f6c64c
      hyukjinkwon authored
      ## What changes were proposed in this pull request?
      
      Since we could not directly define the array type in R, this PR proposes to support array types in R as string types that are used in `structField` as below:
      
      ```R
      jsonArr <- "[{\"name\":\"Bob\"}, {\"name\":\"Alice\"}]"
      df <- as.DataFrame(list(list("people" = jsonArr)))
      collect(select(df, alias(from_json(df$people, "array<struct<name:string>>"), "arrcol")))
      ```
      
      prints
      
      ```R
            arrcol
      1 Bob, Alice
      ```
      
      ## How was this patch tested?
      
      Unit tests in `test_sparkSQL.R`.
      
      Author: hyukjinkwon <gurwls223@gmail.com>
      
      Closes #17178 from HyukjinKwon/SPARK-19828.
      d1f6c64c
    • actuaryzhang's avatar
      [SPARK-19391][SPARKR][ML] Tweedie GLM API for SparkR · f6314eab
      actuaryzhang authored
      ## What changes were proposed in this pull request?
      Port Tweedie GLM  #16344  to SparkR
      
      felixcheung yanboliang
      
      ## How was this patch tested?
      new test in SparkR
      
      Author: actuaryzhang <actuaryzhang10@gmail.com>
      
      Closes #16729 from actuaryzhang/sparkRTweedie.
      f6314eab
  6. Mar 12, 2017
    • Xin Ren's avatar
      [SPARK-19282][ML][SPARKR] RandomForest Wrapper and GBT Wrapper return param "maxDepth" to R models · 9f8ce482
      Xin Ren authored
      ## What changes were proposed in this pull request?
      
      RandomForest R Wrapper and GBT R Wrapper return param `maxDepth` to R models.
      
      Below 4 R wrappers are changed:
      * `RandomForestClassificationWrapper`
      * `RandomForestRegressionWrapper`
      * `GBTClassificationWrapper`
      * `GBTRegressionWrapper`
      
      ## How was this patch tested?
      
      Test manually on my local machine.
      
      Author: Xin Ren <iamshrek@126.com>
      
      Closes #17207 from keypointt/SPARK-19282.
      9f8ce482
  7. Mar 08, 2017
    • Xiao Li's avatar
      [SPARK-19601][SQL] Fix CollapseRepartition rule to preserve shuffle-enabled Repartition · 9a6ac722
      Xiao Li authored
      ### What changes were proposed in this pull request?
      
      Observed by felixcheung  in https://github.com/apache/spark/pull/16739, when users use the shuffle-enabled `repartition` API, they expect the partition they got should be the exact number they provided, even if they call shuffle-disabled `coalesce` later.
      
      Currently, `CollapseRepartition` rule does not consider whether shuffle is enabled or not. Thus, we got the following unexpected result.
      
      ```Scala
          val df = spark.range(0, 10000, 1, 5)
          val df2 = df.repartition(10)
          assert(df2.coalesce(13).rdd.getNumPartitions == 5)
          assert(df2.coalesce(7).rdd.getNumPartitions == 5)
          assert(df2.coalesce(3).rdd.getNumPartitions == 3)
      ```
      
      This PR is to fix the issue. We preserve shuffle-enabled Repartition.
      
      ### How was this patch tested?
      Added a test case
      
      Author: Xiao Li <gatorsmile@gmail.com>
      
      Closes #16933 from gatorsmile/CollapseRepartition.
      9a6ac722
  8. Mar 06, 2017
  9. Mar 05, 2017
    • Felix Cheung's avatar
      [SPARK-19795][SPARKR] add column functions to_json, from_json · 80d5338b
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      Add column functions: to_json, from_json, and tests covering error cases.
      
      ## How was this patch tested?
      
      unit tests, manual
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #17134 from felixcheung/rtojson.
      80d5338b
  10. Mar 04, 2017
  11. Mar 02, 2017
  12. Mar 01, 2017
    • actuaryzhang's avatar
      [DOC][MINOR][SPARKR] Update SparkR doc for names, columns and colnames · 2ff1467d
      actuaryzhang authored
      Update R doc:
      1. columns, names and colnames returns a vector of strings, not **list** as in current doc.
      2. `colnames<-` does allow the subset assignment, so the length of `value` can be less than the number of columns, e.g., `colnames(df)[1] <- "a"`.
      
      felixcheung
      
      Author: actuaryzhang <actuaryzhang10@gmail.com>
      
      Closes #17115 from actuaryzhang/sparkRMinorDoc.
      2ff1467d
    • wm624@hotmail.com's avatar
      [SPARK-19460][SPARKR] Update dataset used in R documentation, examples to... · 89cd3845
      wm624@hotmail.com authored
      [SPARK-19460][SPARKR] Update dataset used in R documentation, examples to reduce warning noise and confusions
      
      ## What changes were proposed in this pull request?
      
      Replace `iris` dataset with `Titanic` or other dataset in example and document.
      
      ## How was this patch tested?
      
      Manual and existing test
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      
      Closes #17032 from wangmiao1981/example.
      89cd3845
  13. Feb 28, 2017
  14. Feb 23, 2017
    • actuaryzhang's avatar
      [SPARK-19682][SPARKR] Issue warning (or error) when subset method "[[" takes vector index · 7bf09433
      actuaryzhang authored
      ## What changes were proposed in this pull request?
      The `[[` method is supposed to take a single index and return a column. This is different from base R which takes a vector index.  We should check for this and issue warning or error when vector index is supplied (which is very likely given the behavior in base R).
      
      Currently I'm issuing a warning message and just take the first element of the vector index. We could change this to an error it that's better.
      
      ## How was this patch tested?
      new tests
      
      Author: actuaryzhang <actuaryzhang10@gmail.com>
      
      Closes #17017 from actuaryzhang/sparkRSubsetter.
      7bf09433
  15. Feb 22, 2017
  16. Feb 17, 2017
  17. Feb 15, 2017
    • Felix Cheung's avatar
      [SPARK-19399][SPARKR] Add R coalesce API for DataFrame and Column · 671bc08e
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      Add coalesce on DataFrame for down partitioning without shuffle and coalesce on Column
      
      ## How was this patch tested?
      
      manual, unit tests
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16739 from felixcheung/rcoalesce.
      671bc08e
    • wm624@hotmail.com's avatar
      [SPARK-19456][SPARKR] Add LinearSVC R API · 3973403d
      wm624@hotmail.com authored
      ## What changes were proposed in this pull request?
      
      Linear SVM classifier is newly added into ML and python API has been added. This JIRA is to add R side API.
      
      Marked as WIP, as I am designing unit tests.
      
      ## How was this patch tested?
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      
      Closes #16800 from wangmiao1981/svc.
      3973403d
  18. Feb 14, 2017
    • Felix Cheung's avatar
      [SPARK-19387][SPARKR] Tests do not run with SparkR source package in CRAN check · a3626ca3
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      - this is cause by changes in SPARK-18444, SPARK-18643 that we no longer install Spark when `master = ""` (default), but also related to SPARK-18449 since the real `master` value is not known at the time the R code in `sparkR.session` is run. (`master` cannot default to "local" since it could be overridden by spark-submit commandline or spark config)
      - as a result, while running SparkR as a package in IDE is working fine, CRAN check is not as it is launching it via non-interactive script
      - fix is to add check to the beginning of each test and vignettes; the same would also work by changing `sparkR.session()` to `sparkR.session(master = "local")` in tests, but I think being more explicit is better.
      
      ## How was this patch tested?
      
      Tested this by reverting version to 2.1, since it needs to download the release jar with matching version. But since there are changes in 2.2 (specifically around SparkR ML) that are incompatible with 2.1, some tests are failing in this config. Will need to port this to branch-2.1 and retest with 2.1 release jar.
      
      manually as:
      ```
      # modify DESCRIPTION to revert version to 2.1.0
      SPARK_HOME=/usr/spark R CMD build pkg
      # run cran check without SPARK_HOME
      R CMD check --as-cran SparkR_2.1.0.tar.gz
      ```
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16720 from felixcheung/rcranchecktest.
      a3626ca3
  19. Feb 12, 2017
    • titicaca's avatar
      [SPARK-19342][SPARKR] bug fixed in collect method for collecting timestamp column · bc0a0e63
      titicaca authored
      ## What changes were proposed in this pull request?
      
      Fix a bug in collect method for collecting timestamp column, the bug can be reproduced as shown in the following codes and outputs:
      
      ```
      library(SparkR)
      sparkR.session(master = "local")
      df <- data.frame(col1 = c(0, 1, 2),
                       col2 = c(as.POSIXct("2017-01-01 00:00:01"), NA, as.POSIXct("2017-01-01 12:00:01")))
      
      sdf1 <- createDataFrame(df)
      print(dtypes(sdf1))
      df1 <- collect(sdf1)
      print(lapply(df1, class))
      
      sdf2 <- filter(sdf1, "col1 > 0")
      print(dtypes(sdf2))
      df2 <- collect(sdf2)
      print(lapply(df2, class))
      ```
      
      As we can see from the printed output, the column type of col2 in df2 is converted to numeric unexpectedly, when NA exists at the top of the column.
      
      This is caused by method `do.call(c, list)`, if we convert a list, i.e. `do.call(c, list(NA, as.POSIXct("2017-01-01 12:00:01"))`, the class of the result is numeric instead of POSIXct.
      
      Therefore, we need to cast the data type of the vector explicitly.
      
      ## How was this patch tested?
      
      The patch can be tested manually with the same code above.
      
      Author: titicaca <fangzhou.yang@hotmail.com>
      
      Closes #16689 from titicaca/sparkr-dev.
      bc0a0e63
  20. Feb 08, 2017
    • Dongjoon Hyun's avatar
      [SPARK-19464][BUILD][HOTFIX] run-tests should use hadoop2.6 · c618ccdb
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      After SPARK-19464, **SparkPullRequestBuilder** fails because it still tries to use hadoop2.3.
      
      **BEFORE**
      https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72595/console
      ```
      ========================================================================
      Building Spark
      ========================================================================
      [error] Could not find hadoop2.3 in the list. Valid options  are ['hadoop2.6', 'hadoop2.7']
      Attempting to post to Github...
       > Post successful.
      ```
      
      **AFTER**
      https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72595/console
      ```
      ========================================================================
      Building Spark
      ========================================================================
      [info] Building Spark (w/Hive 1.2.1) using SBT with these arguments:  -Phadoop-2.6 -Pmesos -Pkinesis-asl -Pyarn -Phive-thriftserver -Phive test:package streaming-kafka-0-8-assembly/assembly streaming-flume-assembly/assembly streaming-kinesis-asl-assembly/assembly
      Using /usr/java/jdk1.8.0_60 as default JAVA_HOME.
      Note, this will be overridden by -java-home if it is set.
      ```
      
      ## How was this patch tested?
      
      Pass the existing test.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #16858 from dongjoon-hyun/hotfix_run-tests.
      Unverified
      c618ccdb
  21. Feb 07, 2017
    • anabranch's avatar
      [SPARK-16609] Add to_date/to_timestamp with format functions · 7a7ce272
      anabranch authored
      ## What changes were proposed in this pull request?
      
      This pull request adds two new user facing functions:
      - `to_date` which accepts an expression and a format and returns a date.
      - `to_timestamp` which accepts an expression and a format and returns a timestamp.
      
      For example, Given a date in format: `2016-21-05`. (YYYY-dd-MM)
      
      ### Date Function
      *Previously*
      ```
      to_date(unix_timestamp(lit("2016-21-05"), "yyyy-dd-MM").cast("timestamp"))
      ```
      *Current*
      ```
      to_date(lit("2016-21-05"), "yyyy-dd-MM")
      ```
      
      ### Timestamp Function
      *Previously*
      ```
      unix_timestamp(lit("2016-21-05"), "yyyy-dd-MM").cast("timestamp")
      ```
      *Current*
      ```
      to_timestamp(lit("2016-21-05"), "yyyy-dd-MM")
      ```
      ### Tasks
      
      - [X] Add `to_date` to Scala Functions
      - [x] Add `to_date` to Python Functions
      - [x] Add `to_date` to SQL Functions
      - [X] Add `to_timestamp` to Scala Functions
      - [x] Add `to_timestamp` to Python Functions
      - [x] Add `to_timestamp` to SQL Functions
      - [x] Add function to R
      
      ## How was this patch tested?
      
      - [x] Add Functions to `DateFunctionsSuite`
      - Test new `ParseToTimestamp` Expression (*not necessary*)
      - Test new `ParseToDate` Expression (*not necessary*)
      - [x] Add test for R
      - [x] Add test for Python in test.py
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: anabranch <wac.chambers@gmail.com>
      Author: Bill Chambers <bill@databricks.com>
      Author: anabranch <bill@databricks.com>
      
      Closes #16138 from anabranch/SPARK-16609.
      7a7ce272
  22. Feb 05, 2017
    • actuaryzhang's avatar
      [SPARK-19452][SPARKR] Fix bug in the name assignment method · b94f4b6f
      actuaryzhang authored
      ## What changes were proposed in this pull request?
      The names method fails to check for validity of the assignment values. This can be fixed by calling colnames within names.
      
      ## How was this patch tested?
      new tests.
      
      Author: actuaryzhang <actuaryzhang10@gmail.com>
      
      Closes #16794 from actuaryzhang/sparkRNames.
      b94f4b6f
  23. Feb 03, 2017
    • actuaryzhang's avatar
      [SPARK-19386][SPARKR][FOLLOWUP] fix error in vignettes · 050c20cc
      actuaryzhang authored
      ## What changes were proposed in this pull request?
      
      Current version has error in vignettes:
      ```
      model <- spark.bisectingKmeans(df, Sepal_Length ~ Sepal_Width, k = 4)
      summary(kmeansModel)
      ```
      
      `kmeansModel` does not exist...
      
      felixcheung wangmiao1981
      
      Author: actuaryzhang <actuaryzhang10@gmail.com>
      
      Closes #16799 from actuaryzhang/sparkRVignettes.
      050c20cc
    • krishnakalyan3's avatar
      [SPARK-19386][SPARKR][DOC] Bisecting k-means in SparkR documentation · 48aafeda
      krishnakalyan3 authored
      ## What changes were proposed in this pull request?
      Update programming guide, example and vignette with Bisecting k-means.
      
      Author: krishnakalyan3 <krishnakalyan3@gmail.com>
      
      Closes #16767 from krishnakalyan3/bisecting-kmeans.
      48aafeda
  24. Jan 31, 2017
    • wm624@hotmail.com's avatar
      [SPARK-19319][SPARKR] SparkR Kmeans summary returns error when the cluster size doesn't equal to k · 9ac05225
      wm624@hotmail.com authored
      ## What changes were proposed in this pull request
      
      When Kmeans using initMode = "random" and some random seed, it is possible the actual cluster size doesn't equal to the configured `k`.
      
      In this case, summary(model) returns error due to the number of cols of coefficient matrix doesn't equal to k.
      
      Example:
      >  col1 <- c(1, 2, 3, 4, 0, 1, 2, 3, 4, 0)
      >   col2 <- c(1, 2, 3, 4, 0, 1, 2, 3, 4, 0)
      >   col3 <- c(1, 2, 3, 4, 0, 1, 2, 3, 4, 0)
      >   cols <- as.data.frame(cbind(col1, col2, col3))
      >   df <- createDataFrame(cols)
      >
      >   model2 <- spark.kmeans(data = df, ~ ., k = 5, maxIter = 10,  initMode = "random", seed = 22222, tol = 1E-5)
      >
      > summary(model2)
      Error in `colnames<-`(`*tmp*`, value = c("col1", "col2", "col3")) :
        length of 'dimnames' [2] not equal to array extent
      In addition: Warning message:
      In matrix(coefficients, ncol = k) :
        data length [9] is not a sub-multiple or multiple of the number of rows [2]
      
      Fix: Get the actual cluster size in the summary and use it to build the coefficient matrix.
      ## How was this patch tested?
      
      Add unit tests.
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      
      Closes #16666 from wangmiao1981/kmeans.
      9ac05225
    • actuaryzhang's avatar
      [SPARK-19395][SPARKR] Convert coefficients in summary to matrix · ce112cec
      actuaryzhang authored
      ## What changes were proposed in this pull request?
      The `coefficients` component in model summary should be 'matrix' but the underlying structure is indeed list. This affects several models except for 'AFTSurvivalRegressionModel' which has the correct implementation. The fix is to first `unlist` the coefficients returned from the `callJMethod` before converting to matrix. An example illustrates the issues:
      
      ```
      data(iris)
      df <- createDataFrame(iris)
      model <- spark.glm(df, Sepal_Length ~ Sepal_Width, family = "gaussian")
      s <- summary(model)
      
      > str(s$coefficients)
      List of 8
       $ : num 6.53
       $ : num -0.223
       $ : num 0.479
       $ : num 0.155
       $ : num 13.6
       $ : num -1.44
       $ : num 0
       $ : num 0.152
       - attr(*, "dim")= int [1:2] 2 4
       - attr(*, "dimnames")=List of 2
        ..$ : chr [1:2] "(Intercept)" "Sepal_Width"
        ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
      > s$coefficients[, 2]
      $`(Intercept)`
      [1] 0.4788963
      
      $Sepal_Width
      [1] 0.1550809
      ```
      
      This  shows that the underlying structure of coefficients is still `list`.
      
      felixcheung wangmiao1981
      
      Author: actuaryzhang <actuaryzhang10@gmail.com>
      
      Closes #16730 from actuaryzhang/sparkRCoef.
      ce112cec
  25. Jan 30, 2017
    • Felix Cheung's avatar
      [SPARKR][DOCS] update R API doc for subset/extract · be7425e2
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      With extract `[[` or replace `[[<-`, the parameter `i` is a column index, that needs to be corrected in doc. Also a few minor updates: examples, links.
      
      ## How was this patch tested?
      
      manual
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16721 from felixcheung/rsubsetdoc.
      be7425e2
  26. Jan 27, 2017
    • Felix Cheung's avatar
      [SPARK-19324][SPARKR] Spark VJM stdout output is getting dropped in SparkR · a7ab6f9a
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      This affects mostly running job from the driver in client mode when results are expected to be through stdout (which should be somewhat rare, but possible)
      
      Before:
      ```
      > a <- as.DataFrame(cars)
      > b <- group_by(a, "dist")
      > c <- count(b)
      > sparkR.callJMethod(c$countjc, "explain", TRUE)
      NULL
      ```
      
      After:
      ```
      > a <- as.DataFrame(cars)
      > b <- group_by(a, "dist")
      > c <- count(b)
      > sparkR.callJMethod(c$countjc, "explain", TRUE)
      count#11L
      NULL
      ```
      
      Now, `column.explain()` doesn't seem very useful (we can get more extensive output with `DataFrame.explain()`) but there are other more complex examples with calls of `println` in Scala/JVM side, that are getting dropped.
      
      ## How was this patch tested?
      
      manual
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16670 from felixcheung/rjvmstdout.
      a7ab6f9a
    • Felix Cheung's avatar
      [SPARK-19333][SPARKR] Add Apache License headers to R files · 385d7384
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      add header
      
      ## How was this patch tested?
      
      Manual run to check vignettes html is created properly
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16709 from felixcheung/rfilelicense.
      385d7384
  27. Jan 26, 2017
    • Felix Cheung's avatar
      [SPARK-18788][SPARKR] Add API for getNumPartitions · 90817a6c
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      With doc to say this would convert DF into RDD
      
      ## How was this patch tested?
      
      unit tests, manual tests
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16668 from felixcheung/rgetnumpartitions.
      90817a6c
    • wm624@hotmail.com's avatar
      [SPARK-18821][SPARKR] Bisecting k-means wrapper in SparkR · c0ba2843
      wm624@hotmail.com authored
      ## What changes were proposed in this pull request?
      
      Add R wrapper for bisecting Kmeans.
      
      As JIRA is down, I will update title to link with corresponding JIRA later.
      
      ## How was this patch tested?
      
      Add new unit tests.
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      
      Closes #16566 from wangmiao1981/bk.
      c0ba2843
Loading