Skip to content
Snippets Groups Projects
  1. May 04, 2017
    • Felix Cheung's avatar
      [SPARK-20015][SPARKR][SS][DOC][EXAMPLE] Document R Structured Streaming... · b8302ccd
      Felix Cheung authored
      [SPARK-20015][SPARKR][SS][DOC][EXAMPLE] Document R Structured Streaming (experimental) in R vignettes and R & SS programming guide, R example
      
      ## What changes were proposed in this pull request?
      
      Add
      - R vignettes
      - R programming guide
      - SS programming guide
      - R example
      
      Also disable spark.als in vignettes for now since it's failing (SPARK-20402)
      
      ## How was this patch tested?
      
      manually
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #17814 from felixcheung/rdocss.
      b8302ccd
  2. May 01, 2017
  3. Apr 29, 2017
  4. Apr 28, 2017
    • wangmiao1981's avatar
      [SPARKR][DOC] Document LinearSVC in R programming guide · 7fe82497
      wangmiao1981 authored
      ## What changes were proposed in this pull request?
      
      add link to svmLinear in the SparkR programming document.
      
      ## How was this patch tested?
      
      Build doc manually and click the link to the document. It looks good.
      
      Author: wangmiao1981 <wm624@hotmail.com>
      
      Closes #17797 from wangmiao1981/doc.
      7fe82497
  5. Apr 27, 2017
  6. Apr 26, 2017
    • zero323's avatar
      [SPARK-20437][R] R wrappers for rollup and cube · df58a95a
      zero323 authored
      ## What changes were proposed in this pull request?
      
      - Add `rollup` and `cube` methods and corresponding generics.
      - Add short description to the vignette.
      
      ## How was this patch tested?
      
      - Existing unit tests.
      - Additional unit tests covering new features.
      - `check-cran.sh`.
      
      Author: zero323 <zero323@users.noreply.github.com>
      
      Closes #17728 from zero323/SPARK-20437.
      df58a95a
  7. Mar 27, 2017
  8. Dec 17, 2016
  9. Dec 08, 2016
    • Yanbo Liang's avatar
      [SPARK-18325][SPARKR][ML] SparkR ML wrappers example code and user guide · 9bf8f3cd
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      * Add all R examples for ML wrappers which were added during 2.1 release cycle.
      * Split the whole ```ml.R``` example file into individual example for each algorithm, which will be convenient for users to rerun them.
      * Add corresponding examples to ML user guide.
      * Update ML section of SparkR user guide.
      
      Note: MLlib Scala/Java/Python examples will be consistent, however, SparkR examples may different from them, since R users may use the algorithms in a different way, for example, using R ```formula``` to specify ```featuresCol``` and ```labelCol```.
      
      ## How was this patch tested?
      Run all examples manually.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #16148 from yanboliang/spark-18325.
      9bf8f3cd
  10. Dec 04, 2016
    • Felix Cheung's avatar
      [SPARK-18643][SPARKR] SparkR hangs at session start when installed as a package without Spark · b019b3a8
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      If SparkR is running as a package and it has previously downloaded Spark Jar it should be able to run as before without having to set SPARK_HOME. Basically with this bug the auto install Spark will only work in the first session.
      
      This seems to be a regression on the earlier behavior.
      
      Fix is to always try to install or check for the cached Spark if running in an interactive session.
      As discussed before, we should probably only install Spark iff running in an interactive session (R shell, RStudio etc)
      
      ## How was this patch tested?
      
      Manually
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #16077 from felixcheung/rsessioninteractive.
      b019b3a8
  11. Nov 23, 2016
  12. Oct 27, 2016
  13. Oct 21, 2016
    • Felix Cheung's avatar
      [SPARK-18013][SPARKR] add crossJoin API · e21e1c94
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      Add crossJoin and do not default to cross join if joinExpr is left out
      
      ## How was this patch tested?
      
      unit test
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #15559 from felixcheung/rcrossjoin.
      e21e1c94
  14. Sep 23, 2016
    • Jeff Zhang's avatar
      [SPARK-17210][SPARKR] sparkr.zip is not distributed to executors when running sparkr in RStudio · f62ddc59
      Jeff Zhang authored
      ## What changes were proposed in this pull request?
      
      Spark will add sparkr.zip to archive only when it is yarn mode (SparkSubmit.scala).
      ```
          if (args.isR && clusterManager == YARN) {
            val sparkRPackagePath = RUtils.localSparkRPackagePath
            if (sparkRPackagePath.isEmpty) {
              printErrorAndExit("SPARK_HOME does not exist for R application in YARN mode.")
            }
            val sparkRPackageFile = new File(sparkRPackagePath.get, SPARKR_PACKAGE_ARCHIVE)
            if (!sparkRPackageFile.exists()) {
              printErrorAndExit(s"$SPARKR_PACKAGE_ARCHIVE does not exist for R application in YARN mode.")
            }
            val sparkRPackageURI = Utils.resolveURI(sparkRPackageFile.getAbsolutePath).toString
      
            // Distribute the SparkR package.
            // Assigns a symbol link name "sparkr" to the shipped package.
            args.archives = mergeFileLists(args.archives, sparkRPackageURI + "#sparkr")
      
            // Distribute the R package archive containing all the built R packages.
            if (!RUtils.rPackages.isEmpty) {
              val rPackageFile =
                RPackageUtils.zipRLibraries(new File(RUtils.rPackages.get), R_PACKAGE_ARCHIVE)
              if (!rPackageFile.exists()) {
                printErrorAndExit("Failed to zip all the built R packages.")
              }
      
              val rPackageURI = Utils.resolveURI(rPackageFile.getAbsolutePath).toString
              // Assigns a symbol link name "rpkg" to the shipped package.
              args.archives = mergeFileLists(args.archives, rPackageURI + "#rpkg")
            }
          }
      ```
      So it is necessary to pass spark.master from R process to JVM. Otherwise sparkr.zip won't be distributed to executor.  Besides that I also pass spark.yarn.keytab/spark.yarn.principal to spark side, because JVM process need them to access secured cluster.
      
      ## How was this patch tested?
      
      Verify it manually in R Studio using the following code.
      ```
      Sys.setenv(SPARK_HOME="/Users/jzhang/github/spark")
      .libPaths(c(file.path(Sys.getenv(), "R", "lib"), .libPaths()))
      library(SparkR)
      sparkR.session(master="yarn-client", sparkConfig = list(spark.executor.instances="1"))
      df <- as.DataFrame(mtcars)
      head(df)
      
      ```
      
      …
      
      Author: Jeff Zhang <zjffdu@apache.org>
      
      Closes #14784 from zjffdu/SPARK-17210.
      f62ddc59
  15. Sep 14, 2016
  16. Jul 25, 2016
  17. Jul 18, 2016
  18. Jul 16, 2016
    • Narine Kokhlikyan's avatar
      [SPARK-16112][SPARKR] Programming guide for gapply/gapplyCollect · 41673048
      Narine Kokhlikyan authored
      ## What changes were proposed in this pull request?
      
      Updates programming guide for spark.gapply/spark.gapplyCollect.
      
      Similar to other examples I used `faithful` dataset to demonstrate gapply's functionality.
      Please, let me know if you prefer another example.
      
      ## How was this patch tested?
      Existing test cases in R
      
      Author: Narine Kokhlikyan <narine@slice.com>
      
      Closes #14090 from NarineK/gapplyProgGuide.
      41673048
  19. Jul 13, 2016
  20. Jul 11, 2016
    • Yanbo Liang's avatar
      [SPARKR][DOC] SparkR ML user guides update for 2.0 · 2ad031be
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      * Update SparkR ML section to make them consistent with SparkR API docs.
      * Since #13972 adds labelling support for the ```include_example``` Jekyll plugin, so that we can split the single ```ml.R``` example file into multiple line blocks with different labels, and include them in different algorithms/models in the generated HTML page.
      
      ## How was this patch tested?
      Only docs update, manually check the generated docs.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #14011 from yanboliang/r-user-guide-update.
      2ad031be
  21. Jun 23, 2016
    • Felix Cheung's avatar
      [SPARK-16088][SPARKR] update setJobGroup, cancelJobGroup, clearJobGroup · b5a99766
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      Updated setJobGroup, cancelJobGroup, clearJobGroup to not require sc/SparkContext as parameter.
      Also updated roxygen2 doc and R programming guide on deprecations.
      
      ## How was this patch tested?
      
      unit tests
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #13838 from felixcheung/rjobgroup.
      b5a99766
  22. Jun 22, 2016
  23. Jun 21, 2016
  24. Jun 20, 2016
  25. Jun 17, 2016
    • GayathriMurali's avatar
      [SPARK-15129][R][DOC] R API changes in ML · af2a4b08
      GayathriMurali authored
      ## What changes were proposed in this pull request?
      
      Make user guide changes to SparkR documentation for all changes that happened in 2.0 to Machine Learning APIs
      
      Author: GayathriMurali <gayathri.m@intel.com>
      
      Closes #13285 from GayathriMurali/SPARK-15129.
      af2a4b08
  26. May 26, 2016
    • felixcheung's avatar
      [SPARK-10903] followup - update API doc for SqlContext · c8288323
      felixcheung authored
      ## What changes were proposed in this pull request?
      
      Follow up on the earlier PR - in here we are fixing up roxygen2 doc examples.
      Also add to the programming guide migration section.
      
      ## How was this patch tested?
      
      SparkR tests
      
      Author: felixcheung <felixcheung_m@hotmail.com>
      
      Closes #13340 from felixcheung/sqlcontextdoc.
      c8288323
  27. May 25, 2016
    • Krishna Kalyan's avatar
      [SPARK-12071][DOC] Document the behaviour of NA in R · 9082b796
      Krishna Kalyan authored
      ## What changes were proposed in this pull request?
      
      Under Upgrading From SparkR 1.5.x to 1.6.x section added the information, SparkSQL converts `NA` in R to `null`.
      
      ## How was this patch tested?
      
      Document update, no tests.
      
      Author: Krishna Kalyan <krishnakalyan3@gmail.com>
      
      Closes #13268 from krishnakalyan3/spark-12071-1.
      9082b796
  28. May 09, 2016
    • Yanbo Liang's avatar
      [MINOR] [SPARKR] Update data-manipulation.R to use native csv reader · ee3b1715
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      * Since Spark has supported native csv reader, it does not necessary to use the third party ```spark-csv``` in ```examples/src/main/r/data-manipulation.R```. Meanwhile, remove all ```spark-csv``` usage in SparkR.
      * Running R applications through ```sparkR``` is not supported as of Spark 2.0, so we change to use ```./bin/spark-submit``` to run the example.
      
      ## How was this patch tested?
      Offline test.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #13005 from yanboliang/r-df-examples.
      ee3b1715
  29. Apr 25, 2016
    • Dongjoon Hyun's avatar
      [SPARK-14883][DOCS] Fix wrong R examples and make them up-to-date · 6ab4d9e0
      Dongjoon Hyun authored
      ## What changes were proposed in this pull request?
      
      This issue aims to fix some errors in R examples and make them up-to-date in docs and example modules.
      
      - Remove the wrong usage of `map`. We need to use `lapply` in `sparkR` if needed. However, `lapply` is private so far. The corrected example will be added later.
      - Fix the wrong example in Section `Generic Load/Save Functions` of `docs/sql-programming-guide.md` for consistency
      - Fix datatypes in `sparkr.md`.
      - Update a data result in `sparkr.md`.
      - Replace deprecated functions to remove warnings: jsonFile -> read.json, parquetFile -> read.parquet
      - Use up-to-date R-like functions: loadDF -> read.df, saveDF -> write.df, saveAsParquetFile -> write.parquet
      - Replace `SparkR DataFrame` with `SparkDataFrame` in `dataframe.R` and `data-manipulation.R`.
      - Other minor syntax fixes and a typo.
      
      ## How was this patch tested?
      
      Manual.
      
      Author: Dongjoon Hyun <dongjoon@apache.org>
      
      Closes #12649 from dongjoon-hyun/SPARK-14883.
      6ab4d9e0
  30. Apr 23, 2016
    • felixcheung's avatar
      [SPARK-12148][SPARKR] fix doc after renaming DataFrame to SparkDataFrame · 1b7eab74
      felixcheung authored
      ## What changes were proposed in this pull request?
      
      Fixed inadvertent roxygen2 doc changes, added class name change to programming guide
      Follow up of #12621
      
      ## How was this patch tested?
      
      manually checked
      
      Author: felixcheung <felixcheung_m@hotmail.com>
      
      Closes #12647 from felixcheung/rdataframe.
      1b7eab74
  31. Jan 19, 2016
  32. Jan 04, 2016
  33. Dec 16, 2015
  34. Dec 03, 2015
  35. Nov 19, 2015
    • felixcheung's avatar
      [SPARK-11339][SPARKR] Document the list of functions in R base package that... · 1a93323c
      felixcheung authored
      [SPARK-11339][SPARKR] Document the list of functions in R base package that are masked by functions with same name in SparkR
      
      Added tests for function that are reported as masked, to make sure the base:: or stats:: function can be called.
      
      For those we can't call, added them to SparkR programming guide.
      
      It would seem to me `table, sample, subset, filter, cov` not working are not actually expected - I investigated/experimented with them but couldn't get them to work. It looks like as they are defined in base or stats they are missing the S3 generic, eg.
      ```
      > methods("transform")
      [1] transform,ANY-method       transform.data.frame
      [3] transform,DataFrame-method transform.default
      see '?methods' for accessing help and source code
      > methods("subset")
      [1] subset.data.frame       subset,DataFrame-method subset.default
      [4] subset.matrix
      see '?methods' for accessing help and source code
      Warning message:
      In .S3methods(generic.function, class, parent.frame()) :
        function 'subset' appears not to be S3 generic; found functions that look like S3 methods
      ```
      Any idea?
      
      More information on masking:
      http://www.ats.ucla.edu/stat/r/faq/referencing_objects.htm
      http://www.sfu.ca/~sweldon/howTo/guide4.pdf
      
      This is what the output doc looks like (minus css):
      ![image](https://cloud.githubusercontent.com/assets/8969467/11229714/2946e5de-8d4d-11e5-94b0-dda9696b6fdd.png)
      
      Author: felixcheung <felixcheung_m@hotmail.com>
      
      Closes #9785 from felixcheung/rmasked.
      1a93323c
  36. Nov 18, 2015
    • Yanbo Liang's avatar
      [SPARK-11684][R][ML][DOC] Update SparkR glm API doc, user guide and example codes · e222d758
      Yanbo Liang authored
      This PR includes:
      * Update SparkR:::glm, SparkR:::summary API docs.
      * Update SparkR machine learning user guide and example codes to show:
        * supporting feature interaction in R formula.
        * summary for gaussian GLM model.
        * coefficients for binomial GLM model.
      
      mengxr
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #9727 from yanboliang/spark-11684.
      e222d758
  37. Nov 03, 2015
  38. Oct 30, 2015
    • felixcheung's avatar
      [SPARK-11340][SPARKR] Support setting driver properties when starting Spark... · bb5a2af0
      felixcheung authored
      [SPARK-11340][SPARKR] Support setting driver properties when starting Spark from R programmatically or from RStudio
      
      Mapping spark.driver.memory from sparkEnvir to spark-submit commandline arguments.
      
      shivaram suggested that we possibly add other spark.driver.* properties - do we want to add all of those? I thought those could be set in SparkConf?
      sun-rui
      
      Author: felixcheung <felixcheung_m@hotmail.com>
      
      Closes #9290 from felixcheung/rdrivermem.
      bb5a2af0
  39. Aug 11, 2015
  40. Jul 08, 2015
Loading