- Apr 25, 2017
-
-
Patrick Wendell authored
-
- Apr 14, 2017
-
-
Patrick Wendell authored
-
Patrick Wendell authored
-
- Apr 02, 2017
-
-
Felix Cheung authored
## What changes were proposed in this pull request? Test failed because SPARK_HOME is not set before Spark is installed. Also current directory is not == SPARK_HOME when tests are run with R CMD check, unlike in Jenkins, so disable that test for now. (that would also disable the test in Jenkins - so this change should not be ported to master as-is.) ## How was this patch tested? Manual run R CMD check Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #17515 from felixcheung/rcrancheck.
-
- Mar 28, 2017
-
-
Patrick Wendell authored
-
Patrick Wendell authored
-
- Mar 22, 2017
-
-
Yanbo Liang authored
## What changes were proposed in this pull request? SparkR ```spark.getSparkFiles``` fails when it was called on executors, see details at [SPARK-19925](https://issues.apache.org/jira/browse/SPARK-19925 ). ## How was this patch tested? Add unit tests, and verify this fix at standalone and yarn cluster. Author: Yanbo Liang <ybliang8@gmail.com> Closes #17274 from yanboliang/spark-19925. (cherry picked from commit 478fbc86) Signed-off-by:
Yanbo Liang <ybliang8@gmail.com>
-
- Mar 21, 2017
-
-
Patrick Wendell authored
-
Felix Cheung authored
## What changes were proposed in this pull request? When SparkR is installed as a R package there might not be any java runtime. If it is not there SparkR's `sparkR.session()` will block waiting for the connection timeout, hanging the R IDE/shell, without any notification or message. ## How was this patch tested? manually - [x] need to test on Windows Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #16596 from felixcheung/rcheckjava. (cherry picked from commit a8877bdb) Signed-off-by:
Shivaram Venkataraman <shivaram@cs.berkeley.edu>
-
- Mar 19, 2017
-
-
Felix Cheung authored
## What changes were proposed in this pull request? Passes R `tempdir()` (this is the R session temp dir, shared with other temp files/dirs) to JVM, set System.Property for derby home dir to move derby.log ## How was this patch tested? Manually, unit tests With this, these are relocated to under /tmp ``` # ls /tmp/RtmpG2M0cB/ derby.log ``` And they are removed automatically when the R session is ended. Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #16330 from felixcheung/rderby. (cherry picked from commit 422aa67d) Signed-off-by:
Felix Cheung <felixcheung@apache.org>
-
- Feb 23, 2017
-
-
actuaryzhang authored
## What changes were proposed in this pull request? The `[[` method is supposed to take a single index and return a column. This is different from base R which takes a vector index. We should check for this and issue warning or error when vector index is supplied (which is very likely given the behavior in base R). Currently I'm issuing a warning message and just take the first element of the vector index. We could change this to an error it that's better. ## How was this patch tested? new tests Author: actuaryzhang <actuaryzhang10@gmail.com> Closes #17017 from actuaryzhang/sparkRSubsetter. (cherry picked from commit 7bf09433) Signed-off-by:
Felix Cheung <felixcheung@apache.org>
-
- Feb 15, 2017
-
-
Felix Cheung authored
## What changes were proposed in this pull request? fix test broken by git merge for #16739 ## How was this patch tested? manual Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #16950 from felixcheung/fixrtest.
-
Felix Cheung authored
Add coalesce on DataFrame for down partitioning without shuffle and coalesce on Column manual, unit tests Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #16739 from felixcheung/rcoalesce. (cherry picked from commit 671bc08e) Signed-off-by:
Felix Cheung <felixcheung@apache.org>
-
- Feb 14, 2017
-
-
Felix Cheung authored
## What changes were proposed in this pull request? - this is cause by changes in SPARK-18444, SPARK-18643 that we no longer install Spark when `master = ""` (default), but also related to SPARK-18449 since the real `master` value is not known at the time the R code in `sparkR.session` is run. (`master` cannot default to "local" since it could be overridden by spark-submit commandline or spark config) - as a result, while running SparkR as a package in IDE is working fine, CRAN check is not as it is launching it via non-interactive script - fix is to add check to the beginning of each test and vignettes; the same would also work by changing `sparkR.session()` to `sparkR.session(master = "local")` in tests, but I think being more explicit is better. ## How was this patch tested? Tested this by reverting version to 2.1, since it needs to download the release jar with matching version. But since there are changes in 2.2 (specifically around SparkR ML) that are incompatible with 2.1, some tests are failing in this config. Will need to port this to branch-2.1 and retest with 2.1 release jar. manually as: ``` # modify DESCRIPTION to revert version to 2.1.0 SPARK_HOME=/usr/spark R CMD build pkg # run cran check without SPARK_HOME R CMD check --as-cran SparkR_2.1.0.tar.gz ``` Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #16720 from felixcheung/rcranchecktest. (cherry picked from commit a3626ca3) Signed-off-by:
Shivaram Venkataraman <shivaram@cs.berkeley.edu>
-
- Feb 12, 2017
-
-
wm624@hotmail.com authored
[SPARK-19319][BACKPORT-2.1][SPARKR] SparkR Kmeans summary returns error when the cluster size doesn't equal to k ## What changes were proposed in this pull request? Backport fix of #16666 ## How was this patch tested? Backport unit tests Author: wm624@hotmail.com <wm624@hotmail.com> Closes #16761 from wangmiao1981/kmeansport.
-
titicaca authored
## What changes were proposed in this pull request? Fix a bug in collect method for collecting timestamp column, the bug can be reproduced as shown in the following codes and outputs: ``` library(SparkR) sparkR.session(master = "local") df <- data.frame(col1 = c(0, 1, 2), col2 = c(as.POSIXct("2017-01-01 00:00:01"), NA, as.POSIXct("2017-01-01 12:00:01"))) sdf1 <- createDataFrame(df) print(dtypes(sdf1)) df1 <- collect(sdf1) print(lapply(df1, class)) sdf2 <- filter(sdf1, "col1 > 0") print(dtypes(sdf2)) df2 <- collect(sdf2) print(lapply(df2, class)) ``` As we can see from the printed output, the column type of col2 in df2 is converted to numeric unexpectedly, when NA exists at the top of the column. This is caused by method `do.call(c, list)`, if we convert a list, i.e. `do.call(c, list(NA, as.POSIXct("2017-01-01 12:00:01"))`, the class of the result is numeric instead of POSIXct. Therefore, we need to cast the data type of the vector explicitly. ## How was this patch tested? The patch can be tested manually with the same code above. Author: titicaca <fangzhou.yang@hotmail.com> Closes #16689 from titicaca/sparkr-dev. (cherry picked from commit bc0a0e63) Signed-off-by:
Felix Cheung <felixcheung@apache.org>
-
- Jan 31, 2017
-
-
Felix Cheung authored
## What changes were proposed in this pull request? backport #16721 to branch-2.1 ## How was this patch tested? manual Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #16749 from felixcheung/rsubsetdocbackport.
-
- Jan 27, 2017
-
-
Felix Cheung authored
## What changes were proposed in this pull request? This affects mostly running job from the driver in client mode when results are expected to be through stdout (which should be somewhat rare, but possible) Before: ``` > a <- as.DataFrame(cars) > b <- group_by(a, "dist") > c <- count(b) > sparkR.callJMethod(c$countjc, "explain", TRUE) NULL ``` After: ``` > a <- as.DataFrame(cars) > b <- group_by(a, "dist") > c <- count(b) > sparkR.callJMethod(c$countjc, "explain", TRUE) count#11L NULL ``` Now, `column.explain()` doesn't seem very useful (we can get more extensive output with `DataFrame.explain()`) but there are other more complex examples with calls of `println` in Scala/JVM side, that are getting dropped. ## How was this patch tested? manual Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #16670 from felixcheung/rjvmstdout. (cherry picked from commit a7ab6f9a) Signed-off-by:
Shivaram Venkataraman <shivaram@cs.berkeley.edu>
-
Felix Cheung authored
## What changes were proposed in this pull request? add header ## How was this patch tested? Manual run to check vignettes html is created properly Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #16709 from felixcheung/rfilelicense. (cherry picked from commit 385d7384) Signed-off-by:
Felix Cheung <felixcheung@apache.org>
-
- Jan 26, 2017
-
-
Felix Cheung authored
## What changes were proposed in this pull request? With doc to say this would convert DF into RDD ## How was this patch tested? unit tests, manual tests Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #16668 from felixcheung/rgetnumpartitions. (cherry picked from commit 90817a6c) Signed-off-by:
Felix Cheung <felixcheung@apache.org>
-
- Jan 24, 2017
-
-
Felix Cheung authored
## What changes were proposed in this pull request? Support for ``` df[[myname]] <- 1 df[[2]] <- df$eruptions ``` ## How was this patch tested? manual tests, unit tests Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #16663 from felixcheung/rcolset. (cherry picked from commit f27e0247) Signed-off-by:
Felix Cheung <felixcheung@apache.org>
-
- Jan 18, 2017
-
-
Felix Cheung authored
## What changes were proposed in this pull request? When R is starting as a package and it needs to download the Spark release distribution we need to handle error for download and untar, and clean up, otherwise it will get stuck. ## How was this patch tested? manually Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #16589 from felixcheung/rtarreturncode. (cherry picked from commit 278fa1eb) Signed-off-by:
Felix Cheung <felixcheung@apache.org>
-
- Jan 17, 2017
-
-
wm624@hotmail.com authored
## What changes were proposed in this pull request? Back port the fix to SPARK-19066 to 2.1 branch. ## How was this patch tested? Unit tests Author: wm624@hotmail.com <wm624@hotmail.com> Closes #16623 from wangmiao1981/bugport.
-
- Jan 16, 2017
-
-
Felix Cheung authored
## What changes were proposed in this pull request? Windows seems to be the only place with appauthor in the path, for which we should say "Apache" (and case sensitive) Current path of `AppData\Local\spark\spark\Cache` is a bit odd. ## How was this patch tested? manual. Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #16590 from felixcheung/rcachedir. (cherry picked from commit a115a543) Signed-off-by:
Shivaram Venkataraman <shivaram@cs.berkeley.edu>
-
- Jan 13, 2017
-
-
Felix Cheung authored
## What changes were proposed in this pull request? To allow specifying number of partitions when the DataFrame is created ## How was this patch tested? manual, unit tests Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #16512 from felixcheung/rnumpart. (cherry picked from commit b0e8eb6d) Signed-off-by:
Shivaram Venkataraman <shivaram@cs.berkeley.edu>
-
- Jan 11, 2017
-
-
Felix Cheung authored
## What changes were proposed in this pull request? ``` df$foo <- 1 ``` instead of ``` df$foo <- lit(1) ``` ## How was this patch tested? unit tests Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #16510 from felixcheung/rlitcol. (cherry picked from commit d749c066) Signed-off-by:
Shivaram Venkataraman <shivaram@cs.berkeley.edu>
-
- Jan 10, 2017
-
-
Felix Cheung authored
## What changes were proposed in this pull request? backporting to 2.1, 2.0 and 1.6 ## How was this patch tested? unit tests Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #16532 from felixcheung/rgammabackport.
-
- Jan 08, 2017
-
-
Felix Cheung authored
## What changes were proposed in this pull request? backport to 2.1 Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #16507 from felixcheung/portsparkuir21.
-
anabranch authored
## What changes were proposed in this pull request? - [X] Make sure all join types are clearly mentioned - [X] Make join labeling/style consistent - [X] Make join label ordering docs the same - [X] Improve join documentation according to above for Scala - [X] Improve join documentation according to above for Python - [X] Improve join documentation according to above for R ## How was this patch tested? No tests b/c docs. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: anabranch <wac.chambers@gmail.com> Closes #16504 from anabranch/SPARK-19126. (cherry picked from commit 19d9d4c8) Signed-off-by:
Felix Cheung <felixcheung@apache.org>
-
anabranch authored
## What changes were proposed in this pull request? - [X] Fix inconsistencies in function reference for dense rank and dense - [X] Make all languages equivalent in their reference to `dense_rank` and `rank`. ## How was this patch tested? N/A for docs. Please review http://spark.apache.org/contributing.html before opening a pull request. Author: anabranch <wac.chambers@gmail.com> Closes #16505 from anabranch/SPARK-19127. (cherry picked from commit 1f6ded64) Signed-off-by:
Reynold Xin <rxin@databricks.com>
-
- Dec 17, 2016
-
-
Felix Cheung authored
## What changes were proposed in this pull request? Reorganizing content (copy/paste) ## How was this patch tested? https://felixcheung.github.io/sparkr-vignettes.html Previous: https://felixcheung.github.io/sparkr-vignettes_old.html Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #16301 from felixcheung/rvignettespass2. (cherry picked from commit 38fd163d) Signed-off-by:
Felix Cheung <felixcheung@apache.org>
-
- Dec 16, 2016
-
-
Dongjoon Hyun authored
## What changes were proposed in this pull request? SparkR tests, `R/run-tests.sh`, succeeds only once because `test_sparkSQL.R` does not clean up the test table, `people`. As a result, the rows in `people` table are accumulated at every run and the test cases fail. The following is the failure result for the second run. ```r Failed ------------------------------------------------------------------------- 1. Failure: create DataFrame from RDD (test_sparkSQL.R#204) ------------------- collect(sql("SELECT age from people WHERE name = 'Bob'"))$age not equal to c(16). Lengths differ: 2 vs 1 2. Failure: create DataFrame from RDD (test_sparkSQL.R#206) ------------------- collect(sql("SELECT height from people WHERE name ='Bob'"))$height not equal to c(176.5). Lengths differ: 2 vs 1 ``` ## How was this patch tested? Manual. Run `run-tests.sh` twice and check if it passes without failures. Author: Dongjoon Hyun <dongjoon@apache.org> Closes #16310 from dongjoon-hyun/SPARK-18897. (cherry picked from commit 1169db44) Signed-off-by:
Shivaram Venkataraman <shivaram@cs.berkeley.edu>
-
- Dec 15, 2016
-
-
Patrick Wendell authored
-
Patrick Wendell authored
-
Patrick Wendell authored
-
Patrick Wendell authored
-
- Dec 14, 2016
-
-
Felix Cheung authored
## What changes were proposed in this pull request? doc cleanup ## How was this patch tested? ~~vignettes is not building for me. I'm going to kick off a full clean build and try again and attach output here for review.~~ Output html here: https://felixcheung.github.io/sparkr-vignettes.html Author: Felix Cheung <felixcheung_m@hotmail.com> Closes #16286 from felixcheung/rvignettespass. (cherry picked from commit 7d858bc5) Signed-off-by:
Shivaram Venkataraman <shivaram@cs.berkeley.edu>
-
wm624@hotmail.com authored
## What changes were proposed in this pull request? When do the QA work, I found that the following issues: 1). `spark.mlp` doesn't include an example; 2). `spark.mlp` and `spark.lda` have redundant parameter explanations; 3). `spark.lda` document misses default values for some parameters. I also changed the `spark.logit` regParam in the examples, as we discussed in #16222. ## How was this patch tested? Manual test Author: wm624@hotmail.com <wm624@hotmail.com> Closes #16284 from wangmiao1981/ks. (cherry picked from commit 32438853) Signed-off-by:
Felix Cheung <felixcheung@apache.org>
-