Skip to content
Snippets Groups Projects
  1. Feb 17, 2017
  2. Feb 15, 2017
    • Felix Cheung's avatar
      [SPARK-19399][SPARKR] Add R coalesce API for DataFrame and Column · 671bc08e
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      Add coalesce on DataFrame for down partitioning without shuffle and coalesce on Column
      
      ## How was this patch tested?
      
      manual, unit tests
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16739 from felixcheung/rcoalesce.
      671bc08e
    • wm624@hotmail.com's avatar
      [SPARK-19456][SPARKR] Add LinearSVC R API · 3973403d
      wm624@hotmail.com authored
      ## What changes were proposed in this pull request?
      
      Linear SVM classifier is newly added into ML and python API has been added. This JIRA is to add R side API.
      
      Marked as WIP, as I am designing unit tests.
      
      ## How was this patch tested?
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      
      Closes #16800 from wangmiao1981/svc.
      3973403d
  3. Feb 14, 2017
    • Felix Cheung's avatar
      [SPARK-19387][SPARKR] Tests do not run with SparkR source package in CRAN check · a3626ca3
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      - this is cause by changes in SPARK-18444, SPARK-18643 that we no longer install Spark when `master = ""` (default), but also related to SPARK-18449 since the real `master` value is not known at the time the R code in `sparkR.session` is run. (`master` cannot default to "local" since it could be overridden by spark-submit commandline or spark config)
      - as a result, while running SparkR as a package in IDE is working fine, CRAN check is not as it is launching it via non-interactive script
      - fix is to add check to the beginning of each test and vignettes; the same would also work by changing `sparkR.session()` to `sparkR.session(master = "local")` in tests, but I think being more explicit is better.
      
      ## How was this patch tested?
      
      Tested this by reverting version to 2.1, since it needs to download the release jar with matching version. But since there are changes in 2.2 (specifically around SparkR ML) that are incompatible with 2.1, some tests are failing in this config. Will need to port this to branch-2.1 and retest with 2.1 release jar.
      
      manually as:
      ```
      # modify DESCRIPTION to revert version to 2.1.0
      SPARK_HOME=/usr/spark R CMD build pkg
      # run cran check without SPARK_HOME
      R CMD check --as-cran SparkR_2.1.0.tar.gz
      ```
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16720 from felixcheung/rcranchecktest.
      a3626ca3
  4. Feb 12, 2017
    • titicaca's avatar
      [SPARK-19342][SPARKR] bug fixed in collect method for collecting timestamp column · bc0a0e63
      titicaca authored
      ## What changes were proposed in this pull request?
      
      Fix a bug in collect method for collecting timestamp column, the bug can be reproduced as shown in the following codes and outputs:
      
      ```
      library(SparkR)
      sparkR.session(master = "local")
      df <- data.frame(col1 = c(0, 1, 2),
                       col2 = c(as.POSIXct("2017-01-01 00:00:01"), NA, as.POSIXct("2017-01-01 12:00:01")))
      
      sdf1 <- createDataFrame(df)
      print(dtypes(sdf1))
      df1 <- collect(sdf1)
      print(lapply(df1, class))
      
      sdf2 <- filter(sdf1, "col1 > 0")
      print(dtypes(sdf2))
      df2 <- collect(sdf2)
      print(lapply(df2, class))
      ```
      
      As we can see from the printed output, the column type of col2 in df2 is converted to numeric unexpectedly, when NA exists at the top of the column.
      
      This is caused by method `do.call(c, list)`, if we convert a list, i.e. `do.call(c, list(NA, as.POSIXct("2017-01-01 12:00:01"))`, the class of the result is numeric instead of POSIXct.
      
      Therefore, we need to cast the data type of the vector explicitly.
      
      ## How was this patch tested?
      
      The patch can be tested manually with the same code above.
      
      Author: titicaca <fangzhou.yang@hotmail.com>
      
      Closes #16689 from titicaca/sparkr-dev.
      bc0a0e63
  5. Feb 08, 2017
    • Dongjoon Hyun's avatar
      [SPARK-19464][BUILD][HOTFIX] run-tests should use hadoop2.6 · c618ccdb
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      After SPARK-19464, **SparkPullRequestBuilder** fails because it still tries to use hadoop2.3.
      
      **BEFORE**
      https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72595/console
      ```
      ========================================================================
      Building Spark
      ========================================================================
      [error] Could not find hadoop2.3 in the list. Valid options  are ['hadoop2.6', 'hadoop2.7']
      Attempting to post to Github...
       > Post successful.
      ```
      
      **AFTER**
      https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72595/console
      ```
      ========================================================================
      Building Spark
      ========================================================================
      [info] Building Spark (w/Hive 1.2.1) using SBT with these arguments:  -Phadoop-2.6 -Pmesos -Pkinesis-asl -Pyarn -Phive-thriftserver -Phive test:package streaming-kafka-0-8-assembly/assembly streaming-flume-assembly/assembly streaming-kinesis-asl-assembly/assembly
      Using /usr/java/jdk1.8.0_60 as default JAVA_HOME.
      Note, this will be overridden by -java-home if it is set.
      ```
      
      ## How was this patch tested?
      
      Pass the existing test.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #16858 from dongjoon-hyun/hotfix_run-tests.
      Unverified
      c618ccdb
  6. Feb 07, 2017
    • anabranch's avatar
      [SPARK-16609] Add to_date/to_timestamp with format functions · 7a7ce272
      anabranch authored
      ## What changes were proposed in this pull request?
      
      This pull request adds two new user facing functions:
      - `to_date` which accepts an expression and a format and returns a date.
      - `to_timestamp` which accepts an expression and a format and returns a timestamp.
      
      For example, Given a date in format: `2016-21-05`. (YYYY-dd-MM)
      
      ### Date Function
      *Previously*
      ```
      to_date(unix_timestamp(lit("2016-21-05"), "yyyy-dd-MM").cast("timestamp"))
      ```
      *Current*
      ```
      to_date(lit("2016-21-05"), "yyyy-dd-MM")
      ```
      
      ### Timestamp Function
      *Previously*
      ```
      unix_timestamp(lit("2016-21-05"), "yyyy-dd-MM").cast("timestamp")
      ```
      *Current*
      ```
      to_timestamp(lit("2016-21-05"), "yyyy-dd-MM")
      ```
      ### Tasks
      
      - [X] Add `to_date` to Scala Functions
      - [x] Add `to_date` to Python Functions
      - [x] Add `to_date` to SQL Functions
      - [X] Add `to_timestamp` to Scala Functions
      - [x] Add `to_timestamp` to Python Functions
      - [x] Add `to_timestamp` to SQL Functions
      - [x] Add function to R
      
      ## How was this patch tested?
      
      - [x] Add Functions to `DateFunctionsSuite`
      - Test new `ParseToTimestamp` Expression (*not necessary*)
      - Test new `ParseToDate` Expression (*not necessary*)
      - [x] Add test for R
      - [x] Add test for Python in test.py
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: anabranch <wac.chambers@gmail.com>
      Author: Bill Chambers <bill@databricks.com>
      Author: anabranch <bill@databricks.com>
      
      Closes #16138 from anabranch/SPARK-16609.
      7a7ce272
  7. Feb 05, 2017
    • actuaryzhang's avatar
      [SPARK-19452][SPARKR] Fix bug in the name assignment method · b94f4b6f
      actuaryzhang authored
      ## What changes were proposed in this pull request?
      The names method fails to check for validity of the assignment values. This can be fixed by calling colnames within names.
      
      ## How was this patch tested?
      new tests.
      
      Author: actuaryzhang <actuaryzhang10@gmail.com>
      
      Closes #16794 from actuaryzhang/sparkRNames.
      b94f4b6f
  8. Feb 03, 2017
    • actuaryzhang's avatar
      [SPARK-19386][SPARKR][FOLLOWUP] fix error in vignettes · 050c20cc
      actuaryzhang authored
      ## What changes were proposed in this pull request?
      
      Current version has error in vignettes:
      ```
      model <- spark.bisectingKmeans(df, Sepal_Length ~ Sepal_Width, k = 4)
      summary(kmeansModel)
      ```
      
      `kmeansModel` does not exist...
      
      felixcheung wangmiao1981
      
      Author: actuaryzhang <actuaryzhang10@gmail.com>
      
      Closes #16799 from actuaryzhang/sparkRVignettes.
      050c20cc
    • krishnakalyan3's avatar
      [SPARK-19386][SPARKR][DOC] Bisecting k-means in SparkR documentation · 48aafeda
      krishnakalyan3 authored
      ## What changes were proposed in this pull request?
      Update programming guide, example and vignette with Bisecting k-means.
      
      Author: krishnakalyan3 <krishnakalyan3@gmail.com>
      
      Closes #16767 from krishnakalyan3/bisecting-kmeans.
      48aafeda
  9. Jan 31, 2017
    • wm624@hotmail.com's avatar
      [SPARK-19319][SPARKR] SparkR Kmeans summary returns error when the cluster size doesn't equal to k · 9ac05225
      wm624@hotmail.com authored
      ## What changes were proposed in this pull request
      
      When Kmeans using initMode = "random" and some random seed, it is possible the actual cluster size doesn't equal to the configured `k`.
      
      In this case, summary(model) returns error due to the number of cols of coefficient matrix doesn't equal to k.
      
      Example:
      >  col1 <- c(1, 2, 3, 4, 0, 1, 2, 3, 4, 0)
      >   col2 <- c(1, 2, 3, 4, 0, 1, 2, 3, 4, 0)
      >   col3 <- c(1, 2, 3, 4, 0, 1, 2, 3, 4, 0)
      >   cols <- as.data.frame(cbind(col1, col2, col3))
      >   df <- createDataFrame(cols)
      >
      >   model2 <- spark.kmeans(data = df, ~ ., k = 5, maxIter = 10,  initMode = "random", seed = 22222, tol = 1E-5)
      >
      > summary(model2)
      Error in `colnames<-`(`*tmp*`, value = c("col1", "col2", "col3")) :
        length of 'dimnames' [2] not equal to array extent
      In addition: Warning message:
      In matrix(coefficients, ncol = k) :
        data length [9] is not a sub-multiple or multiple of the number of rows [2]
      
      Fix: Get the actual cluster size in the summary and use it to build the coefficient matrix.
      ## How was this patch tested?
      
      Add unit tests.
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      
      Closes #16666 from wangmiao1981/kmeans.
      9ac05225
    • actuaryzhang's avatar
      [SPARK-19395][SPARKR] Convert coefficients in summary to matrix · ce112cec
      actuaryzhang authored
      ## What changes were proposed in this pull request?
      The `coefficients` component in model summary should be 'matrix' but the underlying structure is indeed list. This affects several models except for 'AFTSurvivalRegressionModel' which has the correct implementation. The fix is to first `unlist` the coefficients returned from the `callJMethod` before converting to matrix. An example illustrates the issues:
      
      ```
      data(iris)
      df <- createDataFrame(iris)
      model <- spark.glm(df, Sepal_Length ~ Sepal_Width, family = "gaussian")
      s <- summary(model)
      
      > str(s$coefficients)
      List of 8
       $ : num 6.53
       $ : num -0.223
       $ : num 0.479
       $ : num 0.155
       $ : num 13.6
       $ : num -1.44
       $ : num 0
       $ : num 0.152
       - attr(*, "dim")= int [1:2] 2 4
       - attr(*, "dimnames")=List of 2
        ..$ : chr [1:2] "(Intercept)" "Sepal_Width"
        ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
      > s$coefficients[, 2]
      $`(Intercept)`
      [1] 0.4788963
      
      $Sepal_Width
      [1] 0.1550809
      ```
      
      This  shows that the underlying structure of coefficients is still `list`.
      
      felixcheung wangmiao1981
      
      Author: actuaryzhang <actuaryzhang10@gmail.com>
      
      Closes #16730 from actuaryzhang/sparkRCoef.
      ce112cec
  10. Jan 30, 2017
    • Felix Cheung's avatar
      [SPARKR][DOCS] update R API doc for subset/extract · be7425e2
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      With extract `[[` or replace `[[<-`, the parameter `i` is a column index, that needs to be corrected in doc. Also a few minor updates: examples, links.
      
      ## How was this patch tested?
      
      manual
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16721 from felixcheung/rsubsetdoc.
      be7425e2
  11. Jan 27, 2017
    • Felix Cheung's avatar
      [SPARK-19324][SPARKR] Spark VJM stdout output is getting dropped in SparkR · a7ab6f9a
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      This affects mostly running job from the driver in client mode when results are expected to be through stdout (which should be somewhat rare, but possible)
      
      Before:
      ```
      > a <- as.DataFrame(cars)
      > b <- group_by(a, "dist")
      > c <- count(b)
      > sparkR.callJMethod(c$countjc, "explain", TRUE)
      NULL
      ```
      
      After:
      ```
      > a <- as.DataFrame(cars)
      > b <- group_by(a, "dist")
      > c <- count(b)
      > sparkR.callJMethod(c$countjc, "explain", TRUE)
      count#11L
      NULL
      ```
      
      Now, `column.explain()` doesn't seem very useful (we can get more extensive output with `DataFrame.explain()`) but there are other more complex examples with calls of `println` in Scala/JVM side, that are getting dropped.
      
      ## How was this patch tested?
      
      manual
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16670 from felixcheung/rjvmstdout.
      a7ab6f9a
    • Felix Cheung's avatar
      [SPARK-19333][SPARKR] Add Apache License headers to R files · 385d7384
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      add header
      
      ## How was this patch tested?
      
      Manual run to check vignettes html is created properly
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16709 from felixcheung/rfilelicense.
      385d7384
  12. Jan 26, 2017
    • Felix Cheung's avatar
      [SPARK-18788][SPARKR] Add API for getNumPartitions · 90817a6c
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      With doc to say this would convert DF into RDD
      
      ## How was this patch tested?
      
      unit tests, manual tests
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16668 from felixcheung/rgetnumpartitions.
      90817a6c
    • wm624@hotmail.com's avatar
      [SPARK-18821][SPARKR] Bisecting k-means wrapper in SparkR · c0ba2843
      wm624@hotmail.com authored
      ## What changes were proposed in this pull request?
      
      Add R wrapper for bisecting Kmeans.
      
      As JIRA is down, I will update title to link with corresponding JIRA later.
      
      ## How was this patch tested?
      
      Add new unit tests.
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      
      Closes #16566 from wangmiao1981/bk.
      c0ba2843
  13. Jan 24, 2017
    • Felix Cheung's avatar
      [SPARK-18823][SPARKR] add support for assigning to column · f27e0247
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      Support for
      ```
      df[[myname]] <- 1
      df[[2]] <- df$eruptions
      ```
      
      ## How was this patch tested?
      
      manual tests, unit tests
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16663 from felixcheung/rcolset.
      f27e0247
  14. Jan 21, 2017
  15. Jan 18, 2017
    • Felix Cheung's avatar
      [SPARK-19231][SPARKR] add error handling for download and untar for Spark release · 278fa1eb
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      When R is starting as a package and it needs to download the Spark release distribution we need to handle error for download and untar, and clean up, otherwise it will get stuck.
      
      ## How was this patch tested?
      
      manually
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16589 from felixcheung/rtarreturncode.
      278fa1eb
  16. Jan 16, 2017
    • Felix Cheung's avatar
      [SPARK-18828][SPARKR] Refactor scripts for R · c84f7d3e
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      Refactored script to remove duplications and clearer purpose for each script
      
      ## How was this patch tested?
      
      manually
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16249 from felixcheung/rscripts.
      c84f7d3e
    • Felix Cheung's avatar
      [SPARK-19232][SPARKR] Update Spark distribution download cache location on Windows · a115a543
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      Windows seems to be the only place with appauthor in the path, for which we should say "Apache" (and case sensitive)
      Current path of `AppData\Local\spark\spark\Cache` is a bit odd.
      
      ## How was this patch tested?
      
      manual.
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16590 from felixcheung/rcachedir.
      a115a543
    • wm624@hotmail.com's avatar
      [SPARK-19066][SPARKR] SparkR LDA doesn't set optimizer correctly · 12c8c216
      wm624@hotmail.com authored
      ## What changes were proposed in this pull request?
      
      spark.lda passes the optimizer "em" or "online" as a string to the backend. However, LDAWrapper doesn't set optimizer based on the value from R. Therefore, for optimizer "em", the `isDistributed` field is FALSE, which should be TRUE based on scala code.
      
      In addition, the `summary` method should bring back the results related to `DistributedLDAModel`.
      
      ## How was this patch tested?
      Manual tests by comparing with scala example.
      Modified the current unit test: fix the incorrect unit test and add necessary tests for `summary` method.
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      
      Closes #16464 from wangmiao1981/new.
      12c8c216
  17. Jan 13, 2017
    • Felix Cheung's avatar
      [SPARK-18335][SPARKR] createDataFrame to support numPartitions parameter · b0e8eb6d
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      To allow specifying number of partitions when the DataFrame is created
      
      ## How was this patch tested?
      
      manual, unit tests
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16512 from felixcheung/rnumpart.
      b0e8eb6d
    • wm624@hotmail.com's avatar
      [SPARK-19142][SPARKR] spark.kmeans should take seed, initSteps, and tol as parameters · 7f24a0b6
      wm624@hotmail.com authored
      ## What changes were proposed in this pull request?
      spark.kmeans doesn't have interface to set initSteps, seed and tol. As Spark Kmeans algorithm doesn't take the same set of parameters as R kmeans, we should maintain a different interface in spark.kmeans.
      
      Add missing parameters and corresponding document.
      
      Modified existing unit tests to take additional parameters.
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      
      Closes #16523 from wangmiao1981/kmeans.
      7f24a0b6
  18. Jan 11, 2017
  19. Jan 10, 2017
  20. Jan 08, 2017
    • anabranch's avatar
      [SPARK-19126][DOCS] Update Join Documentation Across Languages · 19d9d4c8
      anabranch authored
      ## What changes were proposed in this pull request?
      
      - [X] Make sure all join types are clearly mentioned
      - [X] Make join labeling/style consistent
      - [X] Make join label ordering docs the same
      - [X] Improve join documentation according to above for Scala
      - [X] Improve join documentation according to above for Python
      - [X] Improve join documentation according to above for R
      
      ## How was this patch tested?
      No tests b/c docs.
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: anabranch <wac.chambers@gmail.com>
      
      Closes #16504 from anabranch/SPARK-19126.
      19d9d4c8
    • anabranch's avatar
      [SPARK-19127][DOCS] Update Rank Function Documentation · 1f6ded64
      anabranch authored
      ## What changes were proposed in this pull request?
      
      - [X] Fix inconsistencies in function reference for dense rank and dense
      - [X] Make all languages equivalent in their reference to `dense_rank` and `rank`.
      
      ## How was this patch tested?
      
      N/A for docs.
      
      Please review http://spark.apache.org/contributing.html before opening a pull request.
      
      Author: anabranch <wac.chambers@gmail.com>
      
      Closes #16505 from anabranch/SPARK-19127.
      1f6ded64
    • Yanbo Liang's avatar
      [SPARK-18862][SPARKR][ML] Split SparkR mllib.R into multiple files · 6b6b555a
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      SparkR ```mllib.R``` is getting bigger as we add more ML wrappers, I'd like to split it into multiple files to make us easy to maintain:
      * mllib_classification.R
      * mllib_clustering.R
      * mllib_recommendation.R
      * mllib_regression.R
      * mllib_stat.R
      * mllib_tree.R
      * mllib_utils.R
      
      Note: Only reorg, no actual code change.
      
      ## How was this patch tested?
      Existing tests.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #16312 from yanboliang/spark-18862.
      6b6b555a
  21. Jan 07, 2017
    • Yanbo Liang's avatar
      [MINOR] Bump R version to 2.2.0. · cdda3372
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      #16126 bumps master branch version to 2.2.0-SNAPSHOT, but it seems R version was omitted.
      
      ## How was this patch tested?
      N/A
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #16488 from yanboliang/r-version.
      Unverified
      cdda3372
  22. Dec 22, 2016
    • Felix Cheung's avatar
      [SPARK-18958][SPARKR] R API toJSON on DataFrame · 17579bda
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      It would make it easier to integrate with other component expecting row-based JSON format.
      This replaces the non-public toJSON RDD API.
      
      ## How was this patch tested?
      
      manual, unit tests
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16368 from felixcheung/rJSON.
      17579bda
  23. Dec 21, 2016
    • Felix Cheung's avatar
      [SPARK-18903][SPARKR] Add API to get SparkUI URL · 7e8994ff
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      API for SparkUI URL from SparkContext
      
      ## How was this patch tested?
      
      manual, unit tests
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16367 from felixcheung/rwebui.
      7e8994ff
  24. Dec 17, 2016
  25. Dec 16, 2016
    • Dongjoon Hyun's avatar
      [SPARK-18897][SPARKR] Fix SparkR SQL Test to drop test table · 1169db44
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      SparkR tests, `R/run-tests.sh`, succeeds only once because `test_sparkSQL.R` does not clean up the test table, `people`.
      
      As a result, the rows in `people` table are accumulated at every run and the test cases fail.
      
      The following is the failure result for the second run.
      
      ```r
      Failed -------------------------------------------------------------------------
      1. Failure: create DataFrame from RDD (test_sparkSQL.R#204) -------------------
      collect(sql("SELECT age from people WHERE name = 'Bob'"))$age not equal to c(16).
      Lengths differ: 2 vs 1
      
      2. Failure: create DataFrame from RDD (test_sparkSQL.R#206) -------------------
      collect(sql("SELECT height from people WHERE name ='Bob'"))$height not equal to c(176.5).
      Lengths differ: 2 vs 1
      ```
      
      ## How was this patch tested?
      
      Manual. Run `run-tests.sh` twice and check if it passes without failures.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #16310 from dongjoon-hyun/SPARK-18897.
      1169db44
  26. Dec 14, 2016
  27. Dec 13, 2016
    • wm624@hotmail.com's avatar
      [MINOR][SPARKR] fix kstest example error and add unit test · f2ddabfa
      wm624@hotmail.com authored
      ## What changes were proposed in this pull request?
      
      While adding vignettes for kstest, I found some errors in the example:
      1. There is a typo of kstest;
      2. print.summary.KStest doesn't work with the example;
      
      Fix the example errors;
      Add a new unit test for print.summary.KStest;
      
      ## How was this patch tested?
      Manual test;
      Add new unit test;
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      
      Closes #16259 from wangmiao1981/ks.
      f2ddabfa
Loading