Skip to content
Snippets Groups Projects
  1. Apr 14, 2017
  2. Apr 02, 2017
    • Felix Cheung's avatar
      [SPARK-20197][SPARKR][BRANCH-2.1] CRAN check fail with package installation · ca144106
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      Test failed because SPARK_HOME is not set before Spark is installed.
      Also current directory is not == SPARK_HOME when tests are run with R CMD check, unlike in Jenkins, so disable that test for now. (that would also disable the test in Jenkins - so this change should not be ported to master as-is.)
      
      ## How was this patch tested?
      
      Manual run R CMD check
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #17515 from felixcheung/rcrancheck.
      ca144106
  3. Mar 28, 2017
  4. Mar 22, 2017
  5. Mar 21, 2017
  6. Mar 19, 2017
    • Felix Cheung's avatar
      [SPARK-18817][SPARKR][SQL] change derby log output to temp dir · b60f6902
      Felix Cheung authored
      
      ## What changes were proposed in this pull request?
      
      Passes R `tempdir()` (this is the R session temp dir, shared with other temp files/dirs) to JVM, set System.Property for derby home dir to move derby.log
      
      ## How was this patch tested?
      
      Manually, unit tests
      
      With this, these are relocated to under /tmp
      ```
      # ls /tmp/RtmpG2M0cB/
      derby.log
      ```
      And they are removed automatically when the R session is ended.
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16330 from felixcheung/rderby.
      
      (cherry picked from commit 422aa67d)
      Signed-off-by: default avatarFelix Cheung <felixcheung@apache.org>
      b60f6902
  7. Feb 23, 2017
    • actuaryzhang's avatar
      [SPARK-19682][SPARKR] Issue warning (or error) when subset method "[[" takes vector index · d30238f1
      actuaryzhang authored
      
      ## What changes were proposed in this pull request?
      The `[[` method is supposed to take a single index and return a column. This is different from base R which takes a vector index.  We should check for this and issue warning or error when vector index is supplied (which is very likely given the behavior in base R).
      
      Currently I'm issuing a warning message and just take the first element of the vector index. We could change this to an error it that's better.
      
      ## How was this patch tested?
      new tests
      
      Author: actuaryzhang <actuaryzhang10@gmail.com>
      
      Closes #17017 from actuaryzhang/sparkRSubsetter.
      
      (cherry picked from commit 7bf09433)
      Signed-off-by: default avatarFelix Cheung <felixcheung@apache.org>
      d30238f1
  8. Feb 15, 2017
  9. Feb 14, 2017
    • Felix Cheung's avatar
      [SPARK-19387][SPARKR] Tests do not run with SparkR source package in CRAN check · 7763b0b8
      Felix Cheung authored
      
      ## What changes were proposed in this pull request?
      
      - this is cause by changes in SPARK-18444, SPARK-18643 that we no longer install Spark when `master = ""` (default), but also related to SPARK-18449 since the real `master` value is not known at the time the R code in `sparkR.session` is run. (`master` cannot default to "local" since it could be overridden by spark-submit commandline or spark config)
      - as a result, while running SparkR as a package in IDE is working fine, CRAN check is not as it is launching it via non-interactive script
      - fix is to add check to the beginning of each test and vignettes; the same would also work by changing `sparkR.session()` to `sparkR.session(master = "local")` in tests, but I think being more explicit is better.
      
      ## How was this patch tested?
      
      Tested this by reverting version to 2.1, since it needs to download the release jar with matching version. But since there are changes in 2.2 (specifically around SparkR ML) that are incompatible with 2.1, some tests are failing in this config. Will need to port this to branch-2.1 and retest with 2.1 release jar.
      
      manually as:
      ```
      # modify DESCRIPTION to revert version to 2.1.0
      SPARK_HOME=/usr/spark R CMD build pkg
      # run cran check without SPARK_HOME
      R CMD check --as-cran SparkR_2.1.0.tar.gz
      ```
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16720 from felixcheung/rcranchecktest.
      
      (cherry picked from commit a3626ca3)
      Signed-off-by: default avatarShivaram Venkataraman <shivaram@cs.berkeley.edu>
      7763b0b8
  10. Feb 12, 2017
    • wm624@hotmail.com's avatar
      [SPARK-19319][BACKPORT-2.1][SPARKR] SparkR Kmeans summary returns error when... · 06e77e00
      wm624@hotmail.com authored
      [SPARK-19319][BACKPORT-2.1][SPARKR] SparkR Kmeans summary returns error when the cluster size doesn't equal to k
      
      ## What changes were proposed in this pull request?
      
      Backport fix of #16666
      
      ## How was this patch tested?
      
      Backport unit tests
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      
      Closes #16761 from wangmiao1981/kmeansport.
      06e77e00
    • titicaca's avatar
      [SPARK-19342][SPARKR] bug fixed in collect method for collecting timestamp column · 173c2387
      titicaca authored
      
      ## What changes were proposed in this pull request?
      
      Fix a bug in collect method for collecting timestamp column, the bug can be reproduced as shown in the following codes and outputs:
      
      ```
      library(SparkR)
      sparkR.session(master = "local")
      df <- data.frame(col1 = c(0, 1, 2),
                       col2 = c(as.POSIXct("2017-01-01 00:00:01"), NA, as.POSIXct("2017-01-01 12:00:01")))
      
      sdf1 <- createDataFrame(df)
      print(dtypes(sdf1))
      df1 <- collect(sdf1)
      print(lapply(df1, class))
      
      sdf2 <- filter(sdf1, "col1 > 0")
      print(dtypes(sdf2))
      df2 <- collect(sdf2)
      print(lapply(df2, class))
      ```
      
      As we can see from the printed output, the column type of col2 in df2 is converted to numeric unexpectedly, when NA exists at the top of the column.
      
      This is caused by method `do.call(c, list)`, if we convert a list, i.e. `do.call(c, list(NA, as.POSIXct("2017-01-01 12:00:01"))`, the class of the result is numeric instead of POSIXct.
      
      Therefore, we need to cast the data type of the vector explicitly.
      
      ## How was this patch tested?
      
      The patch can be tested manually with the same code above.
      
      Author: titicaca <fangzhou.yang@hotmail.com>
      
      Closes #16689 from titicaca/sparkr-dev.
      
      (cherry picked from commit bc0a0e63)
      Signed-off-by: default avatarFelix Cheung <felixcheung@apache.org>
      173c2387
  11. Jan 31, 2017
  12. Jan 27, 2017
    • Felix Cheung's avatar
      [SPARK-19324][SPARKR] Spark VJM stdout output is getting dropped in SparkR · 9a49f9af
      Felix Cheung authored
      
      ## What changes were proposed in this pull request?
      
      This affects mostly running job from the driver in client mode when results are expected to be through stdout (which should be somewhat rare, but possible)
      
      Before:
      ```
      > a <- as.DataFrame(cars)
      > b <- group_by(a, "dist")
      > c <- count(b)
      > sparkR.callJMethod(c$countjc, "explain", TRUE)
      NULL
      ```
      
      After:
      ```
      > a <- as.DataFrame(cars)
      > b <- group_by(a, "dist")
      > c <- count(b)
      > sparkR.callJMethod(c$countjc, "explain", TRUE)
      count#11L
      NULL
      ```
      
      Now, `column.explain()` doesn't seem very useful (we can get more extensive output with `DataFrame.explain()`) but there are other more complex examples with calls of `println` in Scala/JVM side, that are getting dropped.
      
      ## How was this patch tested?
      
      manual
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16670 from felixcheung/rjvmstdout.
      
      (cherry picked from commit a7ab6f9a)
      Signed-off-by: default avatarShivaram Venkataraman <shivaram@cs.berkeley.edu>
      9a49f9af
    • Felix Cheung's avatar
      [SPARK-19333][SPARKR] Add Apache License headers to R files · 4002ee97
      Felix Cheung authored
      
      ## What changes were proposed in this pull request?
      
      add header
      
      ## How was this patch tested?
      
      Manual run to check vignettes html is created properly
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16709 from felixcheung/rfilelicense.
      
      (cherry picked from commit 385d7384)
      Signed-off-by: default avatarFelix Cheung <felixcheung@apache.org>
      4002ee97
  13. Jan 26, 2017
  14. Jan 24, 2017
  15. Jan 18, 2017
  16. Jan 17, 2017
  17. Jan 16, 2017
  18. Jan 13, 2017
  19. Jan 11, 2017
  20. Jan 10, 2017
  21. Jan 08, 2017
  22. Dec 17, 2016
  23. Dec 16, 2016
    • Dongjoon Hyun's avatar
      [SPARK-18897][SPARKR] Fix SparkR SQL Test to drop test table · df589be5
      Dongjoon Hyun authored
      
      ## What changes were proposed in this pull request?
      
      SparkR tests, `R/run-tests.sh`, succeeds only once because `test_sparkSQL.R` does not clean up the test table, `people`.
      
      As a result, the rows in `people` table are accumulated at every run and the test cases fail.
      
      The following is the failure result for the second run.
      
      ```r
      Failed -------------------------------------------------------------------------
      1. Failure: create DataFrame from RDD (test_sparkSQL.R#204) -------------------
      collect(sql("SELECT age from people WHERE name = 'Bob'"))$age not equal to c(16).
      Lengths differ: 2 vs 1
      
      2. Failure: create DataFrame from RDD (test_sparkSQL.R#206) -------------------
      collect(sql("SELECT height from people WHERE name ='Bob'"))$height not equal to c(176.5).
      Lengths differ: 2 vs 1
      ```
      
      ## How was this patch tested?
      
      Manual. Run `run-tests.sh` twice and check if it passes without failures.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #16310 from dongjoon-hyun/SPARK-18897.
      
      (cherry picked from commit 1169db44)
      Signed-off-by: default avatarShivaram Venkataraman <shivaram@cs.berkeley.edu>
      df589be5
  24. Dec 15, 2016
  25. Dec 14, 2016
  26. Dec 13, 2016
    • wm624@hotmail.com's avatar
      [MINOR][SPARKR] fix kstest example error and add unit test · 8ef00593
      wm624@hotmail.com authored
      
      ## What changes were proposed in this pull request?
      
      While adding vignettes for kstest, I found some errors in the example:
      1. There is a typo of kstest;
      2. print.summary.KStest doesn't work with the example;
      
      Fix the example errors;
      Add a new unit test for print.summary.KStest;
      
      ## How was this patch tested?
      Manual test;
      Add new unit test;
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      
      Closes #16259 from wangmiao1981/ks.
      
      (cherry picked from commit f2ddabfa)
      Signed-off-by: default avatarYanbo Liang <ybliang8@gmail.com>
      8ef00593
Loading