Commits · a97c497045e9102b8eefcd0a0567ee08e61c838c · cs525-sp18-g07 / spark

You need to sign in or sign up before continuing.

May 26, 2017

[SPARK-20849][DOC][SPARKR] Document R DecisionTree · a97c4970

Zheng RuiFeng authored 7 years ago

## What changes were proposed in this pull request?
1, add an example for sparkr `decisionTree`
2, document it in user guide

## How was this patch tested?
local submit

Author: Zheng RuiFeng <ruifengz@foxmail.com>

Closes #18067 from zhengruifeng/dt_example.

a97c4970

May 04, 2017

[SPARK-20015][SPARKR][SS][DOC][EXAMPLE] Document R Structured Streaming... · b8302ccd

Felix Cheung authored 7 years ago

[SPARK-20015][SPARKR][SS][DOC][EXAMPLE] Document R Structured Streaming (experimental) in R vignettes and R & SS programming guide, R example

## What changes were proposed in this pull request?

Add
- R vignettes
- R programming guide
- SS programming guide
- R example

Also disable spark.als in vignettes for now since it's failing (SPARK-20402)

## How was this patch tested?

manually

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #17814 from felixcheung/rdocss.

b8302ccd

May 01, 2017

[SPARK-20192][SPARKR][DOC] SparkR migration guide to 2.2.0 · d20a976e

Felix Cheung authored 7 years ago

## What changes were proposed in this pull request?

Updating R Programming Guide

## How was this patch tested?

manually

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #17816 from felixcheung/r22relnote.

d20a976e

Apr 29, 2017

[SPARK-20477][SPARKR][DOC] Document R bisecting k-means in R programming guide · b28c3bc2

wangmiao1981 authored 7 years ago

## What changes were proposed in this pull request?

Add hyper link in the SparkR programming guide.

## How was this patch tested?

Build doc and manually check the doc link.

Author: wangmiao1981 <wm624@hotmail.com>

Closes #17805 from wangmiao1981/doc.

b28c3bc2

Apr 28, 2017

[SPARKR][DOC] Document LinearSVC in R programming guide · 7fe82497

wangmiao1981 authored 7 years ago

## What changes were proposed in this pull request?

add link to svmLinear in the SparkR programming document.

## How was this patch tested?

Build doc manually and click the link to the document. It looks good.

Author: wangmiao1981 <wm624@hotmail.com>

Closes #17797 from wangmiao1981/doc.

7fe82497

Apr 27, 2017

[SPARK-20208][DOCS][FOLLOW-UP] Add FP-Growth to SparkR programming guide · ba766627

zero323 authored 7 years ago

## What changes were proposed in this pull request?

Add `spark.fpGrowth` to SparkR programming guide.

## How was this patch tested?

Manual tests.

Author: zero323 <zero323@users.noreply.github.com>

Closes #17775 from zero323/SPARK-20208-FOLLOW-UP.

ba766627

Apr 26, 2017

[SPARK-20437][R] R wrappers for rollup and cube · df58a95a

zero323 authored 7 years ago

## What changes were proposed in this pull request?

- Add `rollup` and `cube` methods and corresponding generics.
- Add short description to the vignette.

## How was this patch tested?

- Existing unit tests.
- Additional unit tests covering new features.
- `check-cran.sh`.

Author: zero323 <zero323@users.noreply.github.com>

Closes #17728 from zero323/SPARK-20437.

df58a95a

Mar 27, 2017

[MINOR][SPARKR] Move 'Data type mapping between R and Spark' to right place in SparkR doc. · 1d00761b

Yanbo Liang authored 8 years ago

Section ```Data type mapping between R and Spark``` was put in the wrong place in SparkR doc currently, we should move it to a separate section.

## What changes were proposed in this pull request?
Before this PR:
![image](https://cloud.githubusercontent.com/assets/1962026/24340911/bc01a532-126a-11e7-9a08-0d60d13a547c.png)

After this PR:
![image](https://cloud.githubusercontent.com/assets/1962026/24340938/d9d32a9a-126a-11e7-8891-d2f5b46e0c71.png)

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #17440 from yanboliang/sparkr-doc.

1d00761b

Dec 17, 2016

[SPARK-18849][ML][SPARKR][DOC] vignettes final check reorg · 38fd163d

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

Reorganizing content (copy/paste)

## How was this patch tested?

https://felixcheung.github.io/sparkr-vignettes.html

Previous:
https://felixcheung.github.io/sparkr-vignettes_old.html

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #16301 from felixcheung/rvignettespass2.

38fd163d

Dec 08, 2016

[SPARK-18325][SPARKR][ML] SparkR ML wrappers example code and user guide · 9bf8f3cd

Yanbo Liang authored 8 years ago

## What changes were proposed in this pull request?
* Add all R examples for ML wrappers which were added during 2.1 release cycle.
* Split the whole ```ml.R``` example file into individual example for each algorithm, which will be convenient for users to rerun them.
* Add corresponding examples to ML user guide.
* Update ML section of SparkR user guide.

Note: MLlib Scala/Java/Python examples will be consistent, however, SparkR examples may different from them, since R users may use the algorithms in a different way, for example, using R ```formula``` to specify ```featuresCol``` and ```labelCol```.

## How was this patch tested?
Run all examples manually.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #16148 from yanboliang/spark-18325.

9bf8f3cd

Dec 04, 2016

[SPARK-18643][SPARKR] SparkR hangs at session start when installed as a package without Spark · b019b3a8

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

If SparkR is running as a package and it has previously downloaded Spark Jar it should be able to run as before without having to set SPARK_HOME. Basically with this bug the auto install Spark will only work in the first session.

This seems to be a regression on the earlier behavior.

Fix is to always try to install or check for the cached Spark if running in an interactive session.
As discussed before, we should probably only install Spark iff running in an interactive session (R shell, RStudio etc)

## How was this patch tested?

Manually

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #16077 from felixcheung/rsessioninteractive.

b019b3a8

Nov 23, 2016

[SPARK-18073][DOCS][WIP] Migrate wiki to spark.apache.org web site · 7e0cd1d9

Sean Owen authored 8 years ago

## What changes were proposed in this pull request?

Updates links to the wiki to links to the new location of content on spark.apache.org.

## How was this patch tested?

Doc builds

Author: Sean Owen <sowen@cloudera.com>

Closes #15967 from srowen/SPARK-18073.1.

Unverified

7e0cd1d9

Oct 27, 2016

[SQL][DOC] updating doc for JSON source to link to jsonlines.org · 44c8bfda

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

API and programming guide doc changes for Scala, Python and R.

## How was this patch tested?

manual test

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #15629 from felixcheung/jsondoc.

44c8bfda

Oct 21, 2016

[SPARK-18013][SPARKR] add crossJoin API · e21e1c94

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

Add crossJoin and do not default to cross join if joinExpr is left out

## How was this patch tested?

unit test

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #15559 from felixcheung/rcrossjoin.

e21e1c94

Sep 23, 2016

[SPARK-17210][SPARKR] sparkr.zip is not distributed to executors when running sparkr in RStudio · f62ddc59

Jeff Zhang authored 8 years ago

## What changes were proposed in this pull request?

Spark will add sparkr.zip to archive only when it is yarn mode (SparkSubmit.scala).
```
    if (args.isR && clusterManager == YARN) {
      val sparkRPackagePath = RUtils.localSparkRPackagePath
      if (sparkRPackagePath.isEmpty) {
        printErrorAndExit("SPARK_HOME does not exist for R application in YARN mode.")
      }
      val sparkRPackageFile = new File(sparkRPackagePath.get, SPARKR_PACKAGE_ARCHIVE)
      if (!sparkRPackageFile.exists()) {
        printErrorAndExit(s"$SPARKR_PACKAGE_ARCHIVE does not exist for R application in YARN mode.")
      }
      val sparkRPackageURI = Utils.resolveURI(sparkRPackageFile.getAbsolutePath).toString

      // Distribute the SparkR package.
      // Assigns a symbol link name "sparkr" to the shipped package.
      args.archives = mergeFileLists(args.archives, sparkRPackageURI + "#sparkr")

      // Distribute the R package archive containing all the built R packages.
      if (!RUtils.rPackages.isEmpty) {
        val rPackageFile =
          RPackageUtils.zipRLibraries(new File(RUtils.rPackages.get), R_PACKAGE_ARCHIVE)
        if (!rPackageFile.exists()) {
          printErrorAndExit("Failed to zip all the built R packages.")
        }

        val rPackageURI = Utils.resolveURI(rPackageFile.getAbsolutePath).toString
        // Assigns a symbol link name "rpkg" to the shipped package.
        args.archives = mergeFileLists(args.archives, rPackageURI + "#rpkg")
      }
    }
```
So it is necessary to pass spark.master from R process to JVM. Otherwise sparkr.zip won't be distributed to executor.  Besides that I also pass spark.yarn.keytab/spark.yarn.principal to spark side, because JVM process need them to access secured cluster.

## How was this patch tested?

Verify it manually in R Studio using the following code.
```
Sys.setenv(SPARK_HOME="/Users/jzhang/github/spark")
.libPaths(c(file.path(Sys.getenv(), "R", "lib"), .libPaths()))
library(SparkR)
sparkR.session(master="yarn-client", sparkConfig = list(spark.executor.instances="1"))
df <- as.DataFrame(mtcars)
head(df)

```

…

Author: Jeff Zhang <zjffdu@apache.org>

Closes #14784 from zjffdu/SPARK-17210.

f62ddc59

Sep 14, 2016

[SPARK-17445][DOCS] Reference an ASF page as the main place to find third-party packages · dc0a4c91

Sean Owen authored 8 years ago

## What changes were proposed in this pull request?

Point references to spark-packages.org to https://cwiki.apache.org/confluence/display/SPARK/Third+Party+Projects

This will be accompanied by a parallel change to the spark-website repo, and additional changes to this wiki.

## How was this patch tested?

Jenkins tests.

Author: Sean Owen <sowen@cloudera.com>

Closes #15075 from srowen/SPARK-17445.

dc0a4c91

Jul 25, 2016

[SPARKR][DOCS] fix broken url in doc · b73defdd

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

Fix broken url, also,

sparkR.session.stop doc page should have it in the header, instead of saying "sparkR.stop"
![image](https://cloud.githubusercontent.com/assets/8969467/17080129/26d41308-50d9-11e6-8967-79d6c920313f.png)

Data type section is in the middle of a list of gapply/gapplyCollect subsections:
![image](https://cloud.githubusercontent.com/assets/8969467/17080122/f992d00a-50d8-11e6-8f2c-fd5786213920.png)

## How was this patch tested?

manual test

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #14329 from felixcheung/rdoclinkfix.

b73defdd

Jul 18, 2016

[SPARKR][DOCS] minor code sample update in R programming guide · 75f0efe7

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

Fix code style from ad hoc review of RC4 doc

## How was this patch tested?

manual

shivaram

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #14250 from felixcheung/rdocs2rc4.

75f0efe7

Jul 16, 2016

[SPARK-16112][SPARKR] Programming guide for gapply/gapplyCollect · 41673048

Narine Kokhlikyan authored 8 years ago

## What changes were proposed in this pull request?

Updates programming guide for spark.gapply/spark.gapplyCollect.

Similar to other examples I used `faithful` dataset to demonstrate gapply's functionality.
Please, let me know if you prefer another example.

## How was this patch tested?
Existing test cases in R

Author: Narine Kokhlikyan <narine@slice.com>

Closes #14090 from NarineK/gapplyProgGuide.

41673048

Jul 13, 2016

[SPARKR][DOCS][MINOR] R programming guide to include csv data source example · fb2e8eeb

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

Minor documentation update for code example, code style, and missed reference to "sparkR.init"

## How was this patch tested?

manual

shivaram

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #14178 from felixcheung/rcsvprogrammingguide.

fb2e8eeb

Jul 11, 2016

[SPARKR][DOC] SparkR ML user guides update for 2.0 · 2ad031be

Yanbo Liang authored 8 years ago

## What changes were proposed in this pull request?
* Update SparkR ML section to make them consistent with SparkR API docs.
* Since #13972 adds labelling support for the ```include_example``` Jekyll plugin, so that we can split the single ```ml.R``` example file into multiple line blocks with different labels, and include them in different algorithms/models in the generated HTML page.

## How was this patch tested?
Only docs update, manually check the generated docs.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #14011 from yanboliang/r-user-guide-update.

2ad031be

Jun 23, 2016

[SPARK-16088][SPARKR] update setJobGroup, cancelJobGroup, clearJobGroup · b5a99766

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

Updated setJobGroup, cancelJobGroup, clearJobGroup to not require sc/SparkContext as parameter.
Also updated roxygen2 doc and R programming guide on deprecations.

## How was this patch tested?

unit tests

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #13838 from felixcheung/rjobgroup.

b5a99766

Jun 22, 2016

[SPARK-15672][R][DOC] R programming guide update · 43b04b7e

Kai Jiang authored 8 years ago

## What changes were proposed in this pull request?
Guide for
- UDFs with dapply, dapplyCollect
- spark.lapply for running parallel R functions

## How was this patch tested?
build locally
<img width="654" alt="screen shot 2016-06-14 at 03 12 56" src="https://cloud.githubusercontent.com/assets/3419881/16039344/12a3b6a0-31de-11e6-8d77-fe23308075c0.png">

Author: Kai Jiang <jiangkai@gmail.com>

Closes #13660 from vectorijk/spark-15672-R-guide-update.

43b04b7e

Jun 21, 2016

[SPARK-15863][SQL][DOC][SPARKR] sql programming guide updates to include sparkSession in R · 58f6e27d

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

Update doc as per discussion in PR #13592

## How was this patch tested?

manual

shivaram liancheng

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #13799 from felixcheung/rsqlprogrammingguide.

58f6e27d

Jun 20, 2016

[SPARK-15159][SPARKR] SparkSession roxygen2 doc, programming guide, example updates · 359c2e82

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

roxygen2 doc, programming guide, example updates

## How was this patch tested?

manual checks
shivaram

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #13751 from felixcheung/rsparksessiondoc.

359c2e82

Jun 17, 2016

[SPARK-15129][R][DOC] R API changes in ML · af2a4b08

GayathriMurali authored 8 years ago

## What changes were proposed in this pull request?

Make user guide changes to SparkR documentation for all changes that happened in 2.0 to Machine Learning APIs

Author: GayathriMurali <gayathri.m@intel.com>

Closes #13285 from GayathriMurali/SPARK-15129.

af2a4b08

May 26, 2016

[SPARK-10903] followup - update API doc for SqlContext · c8288323

felixcheung authored 8 years ago

## What changes were proposed in this pull request?

Follow up on the earlier PR - in here we are fixing up roxygen2 doc examples.
Also add to the programming guide migration section.

## How was this patch tested?

SparkR tests

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #13340 from felixcheung/sqlcontextdoc.

c8288323

May 25, 2016

[SPARK-12071][DOC] Document the behaviour of NA in R · 9082b796

Krishna Kalyan authored 8 years ago

## What changes were proposed in this pull request?

Under Upgrading From SparkR 1.5.x to 1.6.x section added the information, SparkSQL converts `NA` in R to `null`.

## How was this patch tested?

Document update, no tests.

Author: Krishna Kalyan <krishnakalyan3@gmail.com>

Closes #13268 from krishnakalyan3/spark-12071-1.

9082b796

May 09, 2016

[MINOR] [SPARKR] Update data-manipulation.R to use native csv reader · ee3b1715

Yanbo Liang authored 8 years ago

## What changes were proposed in this pull request?
* Since Spark has supported native csv reader, it does not necessary to use the third party ```spark-csv``` in ```examples/src/main/r/data-manipulation.R```. Meanwhile, remove all ```spark-csv``` usage in SparkR.
* Running R applications through ```sparkR``` is not supported as of Spark 2.0, so we change to use ```./bin/spark-submit``` to run the example.

## How was this patch tested?
Offline test.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #13005 from yanboliang/r-df-examples.

ee3b1715

Apr 25, 2016

[SPARK-14883][DOCS] Fix wrong R examples and make them up-to-date · 6ab4d9e0

Dongjoon Hyun authored 8 years ago

## What changes were proposed in this pull request?

This issue aims to fix some errors in R examples and make them up-to-date in docs and example modules.

- Remove the wrong usage of `map`. We need to use `lapply` in `sparkR` if needed. However, `lapply` is private so far. The corrected example will be added later.
- Fix the wrong example in Section `Generic Load/Save Functions` of `docs/sql-programming-guide.md` for consistency
- Fix datatypes in `sparkr.md`.
- Update a data result in `sparkr.md`.
- Replace deprecated functions to remove warnings: jsonFile -> read.json, parquetFile -> read.parquet
- Use up-to-date R-like functions: loadDF -> read.df, saveDF -> write.df, saveAsParquetFile -> write.parquet
- Replace `SparkR DataFrame` with `SparkDataFrame` in `dataframe.R` and `data-manipulation.R`.
- Other minor syntax fixes and a typo.

## How was this patch tested?

Manual.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #12649 from dongjoon-hyun/SPARK-14883.

6ab4d9e0

Apr 23, 2016

[SPARK-12148][SPARKR] fix doc after renaming DataFrame to SparkDataFrame · 1b7eab74

felixcheung authored 8 years ago

## What changes were proposed in this pull request?

Fixed inadvertent roxygen2 doc changes, added class name change to programming guide
Follow up of #12621

## How was this patch tested?

manually checked

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #12647 from felixcheung/rdataframe.

1b7eab74

Jan 19, 2016

[SPARK-12232][SPARKR] New R API for read.table to avoid name conflict · 488bbb21

felixcheung authored 9 years ago

shivaram sorry it took longer to fix some conflicts, this is the change to add an alias for `table`

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #10406 from felixcheung/readtable.

488bbb21

Jan 04, 2016

[SPARKR][DOC] minor doc update for version in migration guide · 8896ec9f

felixcheung authored 9 years ago

checked that the change is in Spark 1.6.0.
shivaram

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #10574 from felixcheung/rwritemodedoc.

8896ec9f

Dec 16, 2015
- [SPARK-12318][SPARKR] Save mode in SparkR should be error by default · 2eb5af5f
  Jeff Zhang authored 9 years ago
  
  shivaram Please help review. Author: Jeff Zhang <zjffdu@apache.org> Closes #10290 from zjffdu/SPARK-12318.
  2eb5af5f
Dec 03, 2015
- [SPARK-12116][SPARKR][DOCS] document how to workaround function name conflicts with dplyr · 43c575cb
  felixcheung authored 9 years ago
  
  shivaram Author: felixcheung <felixcheung_m@hotmail.com> Closes #10119 from felixcheung/rdocdplyrmasked.
  43c575cb
Nov 19, 2015

[SPARK-11339][SPARKR] Document the list of functions in R base package that... · 1a93323c

felixcheung authored 9 years ago

[SPARK-11339][SPARKR] Document the list of functions in R base package that are masked by functions with same name in SparkR

Added tests for function that are reported as masked, to make sure the base:: or stats:: function can be called.

For those we can't call, added them to SparkR programming guide.

It would seem to me `table, sample, subset, filter, cov` not working are not actually expected - I investigated/experimented with them but couldn't get them to work. It looks like as they are defined in base or stats they are missing the S3 generic, eg.
```
> methods("transform")
[1] transform,ANY-method       transform.data.frame
[3] transform,DataFrame-method transform.default
see '?methods' for accessing help and source code
> methods("subset")
[1] subset.data.frame       subset,DataFrame-method subset.default
[4] subset.matrix
see '?methods' for accessing help and source code
Warning message:
In .S3methods(generic.function, class, parent.frame()) :
  function 'subset' appears not to be S3 generic; found functions that look like S3 methods
```
Any idea?

More information on masking:
http://www.ats.ucla.edu/stat/r/faq/referencing_objects.htm
http://www.sfu.ca/~sweldon/howTo/guide4.pdf

This is what the output doc looks like (minus css):
![image](https://cloud.githubusercontent.com/assets/8969467/11229714/2946e5de-8d4d-11e5-94b0-dda9696b6fdd.png)

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #9785 from felixcheung/rmasked.

1a93323c

Nov 18, 2015

[SPARK-11684][R][ML][DOC] Update SparkR glm API doc, user guide and example codes · e222d758

Yanbo Liang authored 9 years ago

This PR includes:
* Update SparkR:::glm, SparkR:::summary API docs.
* Update SparkR machine learning user guide and example codes to show:
  * supporting feature interaction in R formula.
  * summary for gaussian GLM model.
  * coefficients for binomial GLM model.

mengxr

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #9727 from yanboliang/spark-11684.

e222d758

Nov 03, 2015

[SPARK-11407][SPARKR] Add doc for running from RStudio · a9676cc7

felixcheung authored 9 years ago

![image](https://cloud.githubusercontent.com/assets/8969467/10871746/612ba44a-80a4-11e5-99a0-40b9931dee52.png)
(This is without css, but you get the idea)
shivaram

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #9401 from felixcheung/rstudioprogrammingguide.

a9676cc7

Oct 30, 2015

[SPARK-11340][SPARKR] Support setting driver properties when starting Spark... · bb5a2af0

felixcheung authored 9 years ago

[SPARK-11340][SPARKR] Support setting driver properties when starting Spark from R programmatically or from RStudio

Mapping spark.driver.memory from sparkEnvir to spark-submit commandline arguments.

shivaram suggested that we possibly add other spark.driver.* properties - do we want to add all of those? I thought those could be set in SparkConf?
sun-rui

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #9290 from felixcheung/rdrivermem.

bb5a2af0

Aug 11, 2015

[SPARK-9713] [ML] Document SparkR MLlib glm() integration in Spark 1.5 · 74a293f4

Eric Liang authored 9 years ago

This documents the use of R model formulae in the SparkR guide. Also fixes some bugs in the R api doc.

mengxr

Author: Eric Liang <ekl@databricks.com>

Closes #8085 from ericl/docs.

74a293f4