Skip to content
Snippets Groups Projects
  1. Jan 26, 2016
    • Yanbo Liang's avatar
      [SPARK-12903][SPARKR] Add covar_samp and covar_pop for SparkR · e7f9199e
      Yanbo Liang authored
      Add ```covar_samp``` and ```covar_pop``` for SparkR.
      Should we also provide ```cov``` alias for ```covar_samp```? There is ```cov``` implementation at stats.R which masks ```stats::cov``` already, but may bring to breaking API change.
      
      cc sun-rui felixcheung shivaram
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #10829 from yanboliang/spark-12903.
      e7f9199e
  2. Jan 22, 2016
  3. Jan 20, 2016
    • Sun Rui's avatar
      [SPARK-12204][SPARKR] Implement drop method for DataFrame in SparkR. · 1b2a918e
      Sun Rui authored
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #10201 from sun-rui/SPARK-12204.
      1b2a918e
    • smishra8's avatar
      [SPARK-12910] Fixes : R version for installing sparkR · d7415991
      smishra8 authored
      Testing code:
      ```
      $ ./install-dev.sh
      USING R_HOME = /usr/bin
      ERROR: this R is version 2.15.1, package 'SparkR' requires R >= 3.0
      ```
      
      Using the new argument:
      ```
      $ ./install-dev.sh /content/username/SOFTWARE/R-3.2.3
      USING R_HOME = /content/username/SOFTWARE/R-3.2.3/bin
      * installing *source* package ‘SparkR’ ...
      ** R
      ** inst
      ** preparing package for lazy loading
      Creating a new generic function for ‘colnames’ in package ‘SparkR’
      Creating a new generic function for ‘colnames<-’ in package ‘SparkR’
      Creating a new generic function for ‘cov’ in package ‘SparkR’
      Creating a new generic function for ‘na.omit’ in package ‘SparkR’
      Creating a new generic function for ‘filter’ in package ‘SparkR’
      Creating a new generic function for ‘intersect’ in package ‘SparkR’
      Creating a new generic function for ‘sample’ in package ‘SparkR’
      Creating a new generic function for ‘transform’ in package ‘SparkR’
      Creating a new generic function for ‘subset’ in package ‘SparkR’
      Creating a new generic function for ‘summary’ in package ‘SparkR’
      Creating a new generic function for ‘lag’ in package ‘SparkR’
      Creating a new generic function for ‘rank’ in package ‘SparkR’
      Creating a new generic function for ‘sd’ in package ‘SparkR’
      Creating a new generic function for ‘var’ in package ‘SparkR’
      Creating a new generic function for ‘predict’ in package ‘SparkR’
      Creating a new generic function for ‘rbind’ in package ‘SparkR’
      Creating a generic function for ‘lapply’ from package ‘base’ in package ‘SparkR’
      Creating a generic function for ‘Filter’ from package ‘base’ in package ‘SparkR’
      Creating a generic function for ‘alias’ from package ‘stats’ in package ‘SparkR’
      Creating a generic function for ‘substr’ from package ‘base’ in package ‘SparkR’
      Creating a generic function for ‘%in%’ from package ‘base’ in package ‘SparkR’
      Creating a generic function for ‘mean’ from package ‘base’ in package ‘SparkR’
      Creating a generic function for ‘unique’ from package ‘base’ in package ‘SparkR’
      Creating a generic function for ‘nrow’ from package ‘base’ in package ‘SparkR’
      Creating a generic function for ‘ncol’ from package ‘base’ in package ‘SparkR’
      Creating a generic function for ‘head’ from package ‘utils’ in package ‘SparkR’
      Creating a generic function for ‘factorial’ from package ‘base’ in package ‘SparkR’
      Creating a generic function for ‘atan2’ from package ‘base’ in package ‘SparkR’
      Creating a generic function for ‘ifelse’ from package ‘base’ in package ‘SparkR’
      ** help
      No man pages found in package  ‘SparkR’
      *** installing help indices
      ** building package indices
      ** testing if installed package can be loaded
      * DONE (SparkR)
      
      ```
      
      Author: Shubhanshu Mishra <smishra8@illinois.edu>
      
      Closes #10836 from napsternxg/master.
      d7415991
    • Herman van Hovell's avatar
      [SPARK-12848][SQL] Change parsed decimal literal datatype from Double to Decimal · 10173279
      Herman van Hovell authored
      The current parser turns a decimal literal, for example ```12.1```, into a Double. The problem with this approach is that we convert an exact literal into a non-exact ```Double```. The PR changes this behavior, a Decimal literal is now converted into an extact ```BigDecimal```.
      
      The behavior for scientific decimals, for example ```12.1e01```, is unchanged. This will be converted into a Double.
      
      This PR replaces the ```BigDecimal``` literal by a ```Double``` literal, because the ```BigDecimal``` is the default now. You can use the double literal by appending a 'D' to the value, for instance: ```3.141527D```
      
      cc davies rxin
      
      Author: Herman van Hovell <hvanhovell@questtec.nl>
      
      Closes #10796 from hvanhovell/SPARK-12848.
      10173279
  4. Jan 19, 2016
    • felixcheung's avatar
      [SPARK-12232][SPARKR] New R API for read.table to avoid name conflict · 488bbb21
      felixcheung authored
      shivaram sorry it took longer to fix some conflicts, this is the change to add an alias for `table`
      
      Author: felixcheung <felixcheung_m@hotmail.com>
      
      Closes #10406 from felixcheung/readtable.
      488bbb21
    • Sun Rui's avatar
      [SPARK-12337][SPARKR] Implement dropDuplicates() method of DataFrame in SparkR. · 3ac64828
      Sun Rui authored
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #10309 from sun-rui/SPARK-12337.
      3ac64828
    • felixcheung's avatar
      [SPARK-12168][SPARKR] Add automated tests for conflicted function in R · 37fefa66
      felixcheung authored
      Currently this is reported when loading the SparkR package in R (probably would add is.nan)
      ```
      Loading required package: methods
      
      Attaching package: ‘SparkR’
      
      The following objects are masked from ‘package:stats’:
      
          cov, filter, lag, na.omit, predict, sd, var
      
      The following objects are masked from ‘package:base’:
      
          colnames, colnames<-, intersect, rank, rbind, sample, subset,
          summary, table, transform
      ```
      
      Adding this test adds an automated way to track changes to masked method.
      Also, the second part of this test check for those functions that would not be accessible without namespace/package prefix.
      
      Incidentally, this might point to how we would fix those inaccessible functions in base or stats.
      Looking for feedback for adding this test.
      
      Author: felixcheung <felixcheung_m@hotmail.com>
      
      Closes #10171 from felixcheung/rmaskedtest.
      37fefa66
  5. Jan 17, 2016
  6. Jan 15, 2016
    • Oscar D. Lara Yejas's avatar
      [SPARK-11031][SPARKR] Method str() on a DataFrame · ba4a6419
      Oscar D. Lara Yejas authored
      Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.usca.ibm.com>
      Author: Oscar D. Lara Yejas <olarayej@mail.usf.edu>
      Author: Oscar D. Lara Yejas <oscar.lara.yejas@us.ibm.com>
      Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.attlocal.net>
      
      Closes #9613 from olarayej/SPARK-11031.
      ba4a6419
  7. Jan 14, 2016
    • Wenchen Fan's avatar
      [SPARK-12756][SQL] use hash expression in Exchange · 962e9bcf
      Wenchen Fan authored
      This PR makes bucketing and exchange share one common hash algorithm, so that we can guarantee the data distribution is same between shuffle and bucketed data source, which enables us to only shuffle one side when join a bucketed table and a normal one.
      
      This PR also fixes the tests that are broken by the new hash behaviour in shuffle.
      
      Author: Wenchen Fan <wenchen@databricks.com>
      
      Closes #10703 from cloud-fan/use-hash-expr-in-shuffle.
      962e9bcf
  8. Jan 09, 2016
  9. Jan 06, 2016
  10. Jan 05, 2016
    • felixcheung's avatar
      [SPARK-12625][SPARKR][SQL] replace R usage of Spark SQL deprecated API · cc4d5229
      felixcheung authored
      rxin davies shivaram
      Took save mode from my PR #10480, and move everything to writer methods. This is related to PR #10559
      
      - [x] it seems jsonRDD() is broken, need to investigate - this is not a public API though; will look into some more tonight. (fixed)
      
      Author: felixcheung <felixcheung_m@hotmail.com>
      
      Closes #10584 from felixcheung/rremovedeprecated.
      cc4d5229
  11. Jan 03, 2016
  12. Dec 29, 2015
    • Hossein's avatar
      [SPARK-11199][SPARKR] Improve R context management story and add getOrCreate · f6ecf143
      Hossein authored
      * Changes api.r.SQLUtils to use ```SQLContext.getOrCreate``` instead of creating a new context.
      * Adds a simple test
      
      [SPARK-11199] #comment link with JIRA
      
      Author: Hossein <hossein@databricks.com>
      
      Closes #9185 from falaki/SPARK-11199.
      f6ecf143
    • Forest Fang's avatar
      [SPARK-12526][SPARKR] ifelse`, `when`, `otherwise` unable to take Column as value · d80cc90b
      Forest Fang authored
      `ifelse`, `when`, `otherwise` is unable to take `Column` typed S4 object as values.
      
      For example:
      ```r
      ifelse(lit(1) == lit(1), lit(2), lit(3))
      ifelse(df$mpg > 0, df$mpg, 0)
      ```
      will both fail with
      ```r
      attempt to replicate an object of type 'environment'
      ```
      
      The PR replaces `ifelse` calls with `if ... else ...` inside the function implementations to avoid attempt to vectorize(i.e. `rep()`). It remains to be discussed whether we should instead support vectorization in these functions for consistency because `ifelse` in base R is vectorized but I cannot foresee any scenarios these functions will want to be vectorized in SparkR.
      
      For reference, added test cases which trigger failures:
      ```r
      . Error: when(), otherwise() and ifelse() with column on a DataFrame ----------
      error in evaluating the argument 'x' in selecting a method for function 'collect':
        error in evaluating the argument 'col' in selecting a method for function 'select':
        attempt to replicate an object of type 'environment'
      Calls: when -> when -> ifelse -> ifelse
      
      1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls, message = function(c) invokeRestart("muffleMessage"))
      2: eval(code, new_test_environment)
      3: eval(expr, envir, enclos)
      4: expect_equal(collect(select(df, when(df$a > 1 & df$b > 2, lit(1))))[, 1], c(NA, 1)) at test_sparkSQL.R:1126
      5: expect_that(object, equals(expected, label = expected.label, ...), info = info, label = label)
      6: condition(object)
      7: compare(actual, expected, ...)
      8: collect(select(df, when(df$a > 1 & df$b > 2, lit(1))))
      Error: Test failures
      Execution halted
      ```
      
      Author: Forest Fang <forest.fang@outlook.com>
      
      Closes #10481 from saurfang/spark-12526.
      d80cc90b
  13. Dec 19, 2015
  14. Dec 16, 2015
  15. Dec 14, 2015
  16. Dec 11, 2015
    • gatorsmile's avatar
      [SPARK-12158][SPARKR][SQL] Fix 'sample' functions that break R unit test cases · 1e3526c2
      gatorsmile authored
      The existing sample functions miss the parameter `seed`, however, the corresponding function interface in `generics` has such a parameter. Thus, although the function caller can call the function with the 'seed', we are not using the value.
      
      This could cause SparkR unit tests failed. For example, I hit it in another PR:
      https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47213/consoleFull
      
      Author: gatorsmile <gatorsmile@gmail.com>
      
      Closes #10160 from gatorsmile/sampleR.
      1e3526c2
    • Yanbo Liang's avatar
      [SPARK-12146][SPARKR] SparkR jsonFile should support multiple input files · 0fb98255
      Yanbo Liang authored
      * ```jsonFile``` should support multiple input files, such as:
      ```R
      jsonFile(sqlContext, c(“path1”, “path2”)) # character vector as arguments
      jsonFile(sqlContext, “path1,path2”)
      ```
      * Meanwhile, ```jsonFile``` has been deprecated by Spark SQL and will be removed at Spark 2.0. So we mark ```jsonFile``` deprecated and use ```read.json``` at SparkR side.
      * Replace all ```jsonFile``` with ```read.json``` at test_sparkSQL.R, but still keep jsonFile test case.
      * If this PR is accepted, we should also make almost the same change for ```parquetFile```.
      
      cc felixcheung sun-rui shivaram
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #10145 from yanboliang/spark-12146.
      0fb98255
  17. Dec 10, 2015
  18. Dec 07, 2015
    • Sun Rui's avatar
      [SPARK-12034][SPARKR] Eliminate warnings in SparkR test cases. · 39d677c8
      Sun Rui authored
      This PR:
      1. Suppress all known warnings.
      2. Cleanup test cases and fix some errors in test cases.
      3. Fix errors in HiveContext related test cases. These test cases are actually not run previously due to a bug of creating TestHiveContext.
      4. Support 'testthat' package version 0.11.0 which prefers that test cases be under 'tests/testthat'
      5. Make sure the default Hadoop file system is local when running test cases.
      6. Turn on warnings into errors.
      
      Author: Sun Rui <rui.sun@intel.com>
      
      Closes #10030 from sun-rui/SPARK-12034.
      39d677c8
  19. Dec 06, 2015
    • Yanbo Liang's avatar
      [SPARK-12044][SPARKR] Fix usage of isnan, isNaN · b6e8e63a
      Yanbo Liang authored
      1, Add ```isNaN``` to ```Column``` for SparkR. ```Column``` should has three related variable functions: ```isNaN, isNull, isNotNull```.
      2, Replace ```DataFrame.isNaN``` with ```DataFrame.isnan``` at SparkR side. Because ```DataFrame.isNaN``` has been deprecated and will be removed at Spark 2.0.
      <del>3, Add ```isnull``` to ```DataFrame``` for SparkR. ```DataFrame``` should has two related functions: ```isnan, isnull```.<del>
      
      cc shivaram sun-rui felixcheung
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #10037 from yanboliang/spark-12044.
      b6e8e63a
  20. Dec 05, 2015
  21. Dec 03, 2015
  22. Nov 29, 2015
  23. Nov 28, 2015
  24. Nov 27, 2015
    • Yanbo Liang's avatar
      [SPARK-12025][SPARKR] Rename some window rank function names for SparkR · ba02f6cb
      Yanbo Liang authored
      Change ```cumeDist -> cume_dist, denseRank -> dense_rank, percentRank -> percent_rank, rowNumber -> row_number``` at SparkR side.
      There are two reasons that we should make this change:
      * We should follow the [naming convention rule of R](http://www.inside-r.org/node/230645)
      * Spark DataFrame has deprecated the old convention (such as ```cumeDist```) and will remove it in Spark 2.0.
      
      It's better to fix this issue before 1.6 release, otherwise we will make breaking API change.
      cc shivaram sun-rui
      
      Author: Yanbo Liang <ybliang8@gmail.com>
      
      Closes #10016 from yanboliang/SPARK-12025.
      ba02f6cb
  25. Nov 20, 2015
  26. Nov 19, 2015
    • felixcheung's avatar
      [SPARK-11339][SPARKR] Document the list of functions in R base package that... · 1a93323c
      felixcheung authored
      [SPARK-11339][SPARKR] Document the list of functions in R base package that are masked by functions with same name in SparkR
      
      Added tests for function that are reported as masked, to make sure the base:: or stats:: function can be called.
      
      For those we can't call, added them to SparkR programming guide.
      
      It would seem to me `table, sample, subset, filter, cov` not working are not actually expected - I investigated/experimented with them but couldn't get them to work. It looks like as they are defined in base or stats they are missing the S3 generic, eg.
      ```
      > methods("transform")
      [1] transform,ANY-method       transform.data.frame
      [3] transform,DataFrame-method transform.default
      see '?methods' for accessing help and source code
      > methods("subset")
      [1] subset.data.frame       subset,DataFrame-method subset.default
      [4] subset.matrix
      see '?methods' for accessing help and source code
      Warning message:
      In .S3methods(generic.function, class, parent.frame()) :
        function 'subset' appears not to be S3 generic; found functions that look like S3 methods
      ```
      Any idea?
      
      More information on masking:
      http://www.ats.ucla.edu/stat/r/faq/referencing_objects.htm
      http://www.sfu.ca/~sweldon/howTo/guide4.pdf
      
      This is what the output doc looks like (minus css):
      ![image](https://cloud.githubusercontent.com/assets/8969467/11229714/2946e5de-8d4d-11e5-94b0-dda9696b6fdd.png)
      
      Author: felixcheung <felixcheung_m@hotmail.com>
      
      Closes #9785 from felixcheung/rmasked.
      1a93323c
  27. Nov 18, 2015
Loading