Skip to content
Snippets Groups Projects
  1. Sep 19, 2016
  2. Sep 14, 2016
  3. Sep 13, 2016
    • junyangq's avatar
      [SPARK-17317][SPARKR] Add SparkR vignette · a454a4d8
      junyangq authored
      ## What changes were proposed in this pull request?
      
      This PR tries to add a SparkR vignette, which works as a friendly guidance going through the functionality provided by SparkR.
      
      ## How was this patch tested?
      
      Manual test.
      
      Author: junyangq <qianjunyang@gmail.com>
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      Author: Junyang Qian <junyangq@databricks.com>
      
      Closes #14980 from junyangq/SPARKR-vignette.
      a454a4d8
  4. Sep 10, 2016
  5. Sep 09, 2016
  6. Sep 08, 2016
  7. Sep 07, 2016
  8. Sep 03, 2016
    • Junyang Qian's avatar
      [SPARK-17315][SPARKR] Kolmogorov-Smirnov test SparkR wrapper · abb2f921
      Junyang Qian authored
      ## What changes were proposed in this pull request?
      
      This PR tries to add Kolmogorov-Smirnov Test wrapper to SparkR. This wrapper implementation only supports one sample test against normal distribution.
      
      ## How was this patch tested?
      
      R unit test.
      
      Author: Junyang Qian <junyangq@databricks.com>
      
      Closes #14881 from junyangq/SPARK-17315.
      abb2f921
  9. Sep 02, 2016
    • Junyang Qian's avatar
      [SPARKR][MINOR] Fix docs for sparkR.session and count · d2fde6b7
      Junyang Qian authored
      ## What changes were proposed in this pull request?
      
      This PR tries to add some more explanation to `sparkR.session`. It also modifies doc for `count` so when grouped in one doc, the description doesn't confuse users.
      
      ## How was this patch tested?
      
      Manual test.
      
      ![screen shot 2016-09-02 at 1 21 36 pm](https://cloud.githubusercontent.com/assets/15318264/18217198/409613ac-7110-11e6-8dae-cb0c8df557bf.png)
      
      Author: Junyang Qian <junyangq@databricks.com>
      
      Closes #14942 from junyangq/fixSparkRSessionDoc.
      d2fde6b7
    • Srinath Shankar's avatar
      [SPARK-17298][SQL] Require explicit CROSS join for cartesian products · e6132a6c
      Srinath Shankar authored
      ## What changes were proposed in this pull request?
      
      Require the use of CROSS join syntax in SQL (and a new crossJoin
      DataFrame API) to specify explicit cartesian products between relations.
      By cartesian product we mean a join between relations R and S where
      there is no join condition involving columns from both R and S.
      
      If a cartesian product is detected in the absence of an explicit CROSS
      join, an error must be thrown. Turning on the
      "spark.sql.crossJoin.enabled" configuration flag will disable this check
      and allow cartesian products without an explicit CROSS join.
      
      The new crossJoin DataFrame API must be used to specify explicit cross
      joins. The existing join(DataFrame) method will produce a INNER join
      that will require a subsequent join condition.
      That is df1.join(df2) is equivalent to select * from df1, df2.
      
      ## How was this patch tested?
      
      Added cross-join.sql to the SQLQueryTestSuite to test the check for cartesian products. Added a couple of tests to the DataFrameJoinSuite to test the crossJoin API. Modified various other test suites to explicitly specify a cross join where an INNER join or a comma-separated list was previously used.
      
      Author: Srinath Shankar <srinath@databricks.com>
      
      Closes #14866 from srinathshankar/crossjoin.
      e6132a6c
    • Felix Cheung's avatar
      [SPARK-17376][SPARKR] followup - change since version · eac1d0e9
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      change since version in doc
      
      ## How was this patch tested?
      
      manual
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #14939 from felixcheung/rsparkversion2.
      eac1d0e9
    • Felix Cheung's avatar
      [SPARKR][DOC] regexp_extract should doc that it returns empty string when match fails · 419eefd8
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      Doc change - see https://issues.apache.org/jira/browse/SPARK-16324
      
      ## How was this patch tested?
      
      manual check
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #14934 from felixcheung/regexpextractdoc.
      419eefd8
    • Felix Cheung's avatar
      [SPARK-17376][SPARKR] Spark version should be available in R · 812333e4
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      Add sparkR.version() API.
      
      ```
      > sparkR.version()
      [1] "2.1.0-SNAPSHOT"
      ```
      
      ## How was this patch tested?
      
      manual, unit tests
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #14935 from felixcheung/rsparksessionversion.
      812333e4
    • wm624@hotmail.com's avatar
      [SPARK-16883][SPARKR] SQL decimal type is not properly cast to number when... · 0f30cded
      wm624@hotmail.com authored
      [SPARK-16883][SPARKR] SQL decimal type is not properly cast to number when collecting SparkDataFrame
      
      ## What changes were proposed in this pull request?
      
      (Please fill in changes proposed in this fix)
      
      registerTempTable(createDataFrame(iris), "iris")
      str(collect(sql("select cast('1' as double) as x, cast('2' as decimal) as y  from iris limit 5")))
      
      'data.frame':	5 obs. of  2 variables:
       $ x: num  1 1 1 1 1
       $ y:List of 5
        ..$ : num 2
        ..$ : num 2
        ..$ : num 2
        ..$ : num 2
        ..$ : num 2
      
      The problem is that spark returns `decimal(10, 0)` col type, instead of `decimal`. Thus, `decimal(10, 0)` is not handled correctly. It should be handled as "double".
      
      As discussed in JIRA thread, we can have two potential fixes:
      1). Scala side fix to add a new case when writing the object back; However, I can't use spark.sql.types._ in Spark core due to dependency issues. I don't find a way of doing type case match;
      
      2). SparkR side fix: Add a helper function to check special type like `"decimal(10, 0)"` and replace it with `double`, which is PRIMITIVE type. This special helper is generic for adding new types handling in the future.
      
      I open this PR to discuss pros and cons of both approaches. If we want to do Scala side fix, we need to find a way to match the case of DecimalType and StructType in Spark Core.
      
      ## How was this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      
      Manual test:
      > str(collect(sql("select cast('1' as double) as x, cast('2' as decimal) as y  from iris limit 5")))
      'data.frame':	5 obs. of  2 variables:
       $ x: num  1 1 1 1 1
       $ y: num  2 2 2 2 2
      R Unit tests
      
      Author: wm624@hotmail.com <wm624@hotmail.com>
      
      Closes #14613 from wangmiao1981/type.
      0f30cded
  10. Aug 31, 2016
  11. Aug 29, 2016
    • Shivaram Venkataraman's avatar
      [SPARK-16581][SPARKR] Make JVM backend calling functions public · 736a7911
      Shivaram Venkataraman authored
      ## What changes were proposed in this pull request?
      
      This change exposes a public API in SparkR to create objects, call methods on the Spark driver JVM
      
      ## How was this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      
      Unit tests, CRAN checks
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #14775 from shivaram/sparkr-java-api.
      736a7911
    • Junyang Qian's avatar
      [SPARKR][MINOR] Fix LDA doc · 6a0fda2c
      Junyang Qian authored
      ## What changes were proposed in this pull request?
      
      This PR tries to fix the name of the `SparkDataFrame` used in the example. Also, it gives a reference url of an example data file so that users can play with.
      
      ## How was this patch tested?
      
      Manual test.
      
      Author: Junyang Qian <junyangq@databricks.com>
      
      Closes #14853 from junyangq/SPARKR-FixLDADoc.
      6a0fda2c
  12. Aug 26, 2016
    • Junyang Qian's avatar
      [SPARKR][MINOR] Fix example of spark.naiveBayes · 18832162
      Junyang Qian authored
      ## What changes were proposed in this pull request?
      
      The original example doesn't work because the features are not categorical. This PR fixes this by changing to another dataset.
      
      ## How was this patch tested?
      
      Manual test.
      
      Author: Junyang Qian <junyangq@databricks.com>
      
      Closes #14820 from junyangq/SPARK-FixNaiveBayes.
      18832162
  13. Aug 24, 2016
    • Junyang Qian's avatar
      [SPARKR][MINOR] Add installation message for remote master mode and improve other messages · 3a60be4b
      Junyang Qian authored
      ## What changes were proposed in this pull request?
      
      This PR gives informative message to users when they try to connect to a remote master but don't have Spark package in their local machine.
      
      As a clarification, for now, automatic installation will only happen if they start SparkR in R console (rather than from sparkr-shell) and connect to local master. In the remote master mode, local Spark package is still needed, but we will not trigger the install.spark function because the versions have to match those on the cluster, which involves more user input. Instead, we here try to provide detailed message that may help the users.
      
      Some of the other messages have also been slightly changed.
      
      ## How was this patch tested?
      
      Manual test.
      
      Author: Junyang Qian <junyangq@databricks.com>
      
      Closes #14761 from junyangq/SPARK-16579-V1.
      3a60be4b
    • Junyang Qian's avatar
      [SPARKR][MINOR] Add more examples to window function docs · 18708f76
      Junyang Qian authored
      ## What changes were proposed in this pull request?
      
      This PR adds more examples to window function docs to make them more accessible to the users.
      
      It also fixes default value issues for `lag` and `lead`.
      
      ## How was this patch tested?
      
      Manual test, R unit test.
      
      Author: Junyang Qian <junyangq@databricks.com>
      
      Closes #14779 from junyangq/SPARKR-FixWindowFunctionDocs.
      18708f76
    • Felix Cheung's avatar
      [MINOR][SPARKR] fix R MLlib parameter documentation · 945c04bc
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      Fixed several misplaced param tag - they should be on the spark.* method generics
      
      ## How was this patch tested?
      
      run knitr
      junyangq
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #14792 from felixcheung/rdocmllib.
      945c04bc
    • Xin Ren's avatar
      [SPARK-16445][MLLIB][SPARKR] Multilayer Perceptron Classifier wrapper in SparkR · 2fbdb606
      Xin Ren authored
      https://issues.apache.org/jira/browse/SPARK-16445
      
      ## What changes were proposed in this pull request?
      
      Create Multilayer Perceptron Classifier wrapper in SparkR
      
      ## How was this patch tested?
      
      Tested manually on local machine
      
      Author: Xin Ren <iamshrek@126.com>
      
      Closes #14447 from keypointt/SPARK-16445.
      2fbdb606
    • Junyang Qian's avatar
      [SPARKR][MINOR] Fix doc for show method · d2932a0e
      Junyang Qian authored
      ## What changes were proposed in this pull request?
      
      The original doc of `show` put methods for multiple classes together but the text only talks about `SparkDataFrame`. This PR tries to fix this problem.
      
      ## How was this patch tested?
      
      Manual test.
      
      Author: Junyang Qian <junyangq@databricks.com>
      
      Closes #14776 from junyangq/SPARK-FixShowDoc.
      d2932a0e
  14. Aug 23, 2016
    • Junyang Qian's avatar
      [SPARKR][MINOR] Remove reference link for common Windows environment variables · 8fd63e80
      Junyang Qian authored
      ## What changes were proposed in this pull request?
      
      The PR removes reference link in the doc for environment variables for common Windows folders. The cran check gave code 503: service unavailable on the original link.
      
      ## How was this patch tested?
      
      Manual check.
      
      Author: Junyang Qian <junyangq@databricks.com>
      
      Closes #14767 from junyangq/SPARKR-RemoveLink.
      8fd63e80
  15. Aug 22, 2016
    • Felix Cheung's avatar
      [SPARKR][MINOR] Update R DESCRIPTION file · d2b3d3e6
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      Update DESCRIPTION
      
      ## How was this patch tested?
      
      Run install and CRAN tests
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #14764 from felixcheung/rpackagedescription.
      d2b3d3e6
    • Shivaram Venkataraman's avatar
      [SPARK-16577][SPARKR] Add CRAN documentation checks to run-tests.sh · 920806ab
      Shivaram Venkataraman authored
      ## What changes were proposed in this pull request?
      
      (Please fill in changes proposed in this fix)
      
      ## How was this patch tested?
      
      This change adds CRAN documentation checks to be run as a part of `R/run-tests.sh` . As this script is also used by Jenkins this means that we will get documentation checks on every PR going forward.
      
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #14759 from shivaram/sparkr-cran-jenkins.
      920806ab
    • Felix Cheung's avatar
      [SPARK-16508][SPARKR] doc updates and more CRAN check fixes · 71afeeea
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      replace ``` ` ``` in code doc with `\code{thing}`
      remove added `...` for drop(DataFrame)
      fix remaining CRAN check warnings
      
      ## How was this patch tested?
      
      create doc with knitr
      
      junyangq
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #14734 from felixcheung/rdoccleanup.
      71afeeea
    • Shivaram Venkataraman's avatar
      [SPARKR][MINOR] Add Xiangrui and Felix to maintainers · 6f3cd36f
      Shivaram Venkataraman authored
      ## What changes were proposed in this pull request?
      
      This change adds Xiangrui Meng and Felix Cheung to the maintainers field in the package description.
      
      ## How was this patch tested?
      
      (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
      
      (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
      
      Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
      
      Closes #14758 from shivaram/sparkr-maintainers.
      6f3cd36f
    • Felix Cheung's avatar
      [SPARK-17173][SPARKR] R MLlib refactor, cleanup, reformat, fix deprecation in test · 0583ecda
      Felix Cheung authored
      ## What changes were proposed in this pull request?
      
      refactor, cleanup, reformat, fix deprecation in test
      
      ## How was this patch tested?
      
      unit tests, manual tests
      
      Author: Felix Cheung <felixcheung_m@hotmail.com>
      
      Closes #14735 from felixcheung/rmllibutil.
      0583ecda
    • Junyang Qian's avatar
      [SPARKR][MINOR] Fix Cache Folder Path in Windows · 209e1b3c
      Junyang Qian authored
      ## What changes were proposed in this pull request?
      
      This PR tries to fix the scheme of local cache folder in Windows. The name of the environment variable should be `LOCALAPPDATA` rather than `%LOCALAPPDATA%`.
      
      ## How was this patch tested?
      
      Manual test in Windows 7.
      
      Author: Junyang Qian <junyangq@databricks.com>
      
      Closes #14743 from junyangq/SPARKR-FixWindowsInstall.
      209e1b3c
  16. Aug 21, 2016
    • Xiangrui Meng's avatar
      [MINOR][R] add SparkR.Rcheck/ and SparkR_*.tar.gz to R/.gitignore · ab714346
      Xiangrui Meng authored
      ## What changes were proposed in this pull request?
      
      Ignore temp files generated by `check-cran.sh`.
      
      Author: Xiangrui Meng <meng@databricks.com>
      
      Closes #14740 from mengxr/R-gitignore.
      ab714346
    • Yanbo Liang's avatar
      [SPARK-16961][FOLLOW-UP][SPARKR] More robust test case for spark.gaussianMixture. · 7f08a60b
      Yanbo Liang authored
      ## What changes were proposed in this pull request?
      #14551 fixed off-by-one bug in ```randomizeInPlace``` and some test failure caused by this fix.
      But for SparkR ```spark.gaussianMixture``` test case, the fix is inappropriate. It only changed the output result of native R which should be compared by SparkR, however, it did not change the R code in annotation which is used for reproducing the result in native R. It will confuse users who can not reproduce the same result in native R. This PR sends a more robust test case which can produce same result between SparkR and native R.
      
      ## How was this patch tested?
      Unit test update.
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #14730 from yanboliang/spark-16961-followup.
      7f08a60b
  17. Aug 20, 2016
    • Junyang Qian's avatar
      [SPARK-16508][SPARKR] Fix CRAN undocumented/duplicated arguments warnings. · 01401e96
      Junyang Qian authored
      ## What changes were proposed in this pull request?
      
      This PR tries to fix all the remaining "undocumented/duplicated arguments" warnings given by CRAN-check.
      
      One left is doc for R `stats::glm` exported in SparkR. To mute that warning, we have to also provide document for all arguments of that non-SparkR function.
      
      Some previous conversation is in #14558.
      
      ## How was this patch tested?
      
      R unit test and `check-cran.sh` script (with no-test).
      
      Author: Junyang Qian <junyangq@databricks.com>
      
      Closes #14705 from junyangq/SPARK-16508-master.
      01401e96
  18. Aug 19, 2016
Loading