Commits · 48978abfa4d8f2cf79a4b053cc8bc7254cc2d61b · cs525-sp18-g07 / spark

Mar 13, 2016

[SPARK-13812][SPARKR] Fix SparkR lint-r test errors. · c7e68c39

Sun Rui authored 9 years ago

## What changes were proposed in this pull request?

This PR fixes all newly captured SparkR lint-r errors after the lintr package is updated from github.

## How was this patch tested?

dev/lint-r
SparkR unit tests

Author: Sun Rui <rui.sun@intel.com>

Closes #11652 from sun-rui/SPARK-13812.

c7e68c39

Mar 10, 2016

[SPARK-13389][SPARKR] SparkR support first/last with ignore NAs · 4d535d1f

Yanbo Liang authored 9 years ago

## What changes were proposed in this pull request?

SparkR support first/last with ignore NAs

cc sun-rui felixcheung shivaram

## How was the this patch tested?

unit tests

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #11267 from yanboliang/spark-13389.

4d535d1f

[SPARK-13327][SPARKR] Added parameter validations for colnames<- · 416e71af

Oscar D. Lara Yejas authored 9 years ago

Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.attlocal.net>
Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.usca.ibm.com>

Closes #11220 from olarayej/SPARK-13312-3.

416e71af

Feb 25, 2016

[SPARK-13504] [SPARKR] Add approxQuantile for SparkR · 50e60e36

Yanbo Liang authored 9 years ago

## What changes were proposed in this pull request?
Add ```approxQuantile``` for SparkR.
## How was this patch tested?
unit tests

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #11383 from yanboliang/spark-13504 and squashes the following commits:

4f17adb [Yanbo Liang] Add approxQuantile for SparkR

50e60e36

Feb 24, 2016

[SPARK-13472] [SPARKR] Fix unstable Kmeans test in R · 89301818

Liang-Chi Hsieh authored 9 years ago

JIRA: https://issues.apache.org/jira/browse/SPARK-13472

## What changes were proposed in this pull request?

One Kmeans test in R is unstable and sometimes fails. We should fix it.

## How was this patch tested?

Unit test is modified in this PR.

Author: Liang-Chi Hsieh <viirya@gmail.com>

Closes #11345 from viirya/fix-kmeans-r-test and squashes the following commits:

f959f61 [Liang-Chi Hsieh] Sort resulted clusters.

89301818

Feb 23, 2016

[SPARK-13011] K-means wrapper in SparkR · 8d29001d

Xusen Yin authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-13011

Author: Xusen Yin <yinxusen@gmail.com>

Closes #11124 from yinxusen/SPARK-13011.

8d29001d

Feb 22, 2016

[MINOR][DOCS] Fix all typos in markdown files of `doc` and similar patterns in other comments · 024482bf

Dongjoon Hyun authored 9 years ago

## What changes were proposed in this pull request?

This PR tries to fix all typos in all markdown files under `docs` module,
and fixes similar typos in other comments, too.

## How was the this patch tested?

manual tests.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #11300 from dongjoon-hyun/minor_fix_typos.

024482bf

Feb 21, 2016

[SPARK-12799] Simplify various string output for expressions · d9efe63e

Cheng Lian authored 9 years ago

This PR introduces several major changes:

1. Replacing `Expression.prettyString` with `Expression.sql`

   The `prettyString` method is mostly an internal, developer faced facility for debugging purposes, and shouldn't be exposed to users.

1. Using SQL-like representation as column names for selected fields that are not named expression (back-ticks and double quotes should be removed)

   Before, we were using `prettyString` as column names when possible, and sometimes the result column names can be weird.  Here are several examples:

   Expression         | `prettyString` | `sql`      | Note
   ------------------ | -------------- | ---------- | ---------------
   `a && b`           | `a && b`       | `a AND b`  |
   `a.getField("f")`  | `a[f]`         | `a.f`      | `a` is a struct

1. Adding trait `NonSQLExpression` extending from `Expression` for expressions that don't have a SQL representation (e.g. Scala UDF/UDAF and Java/Scala object expressions used for encoders)

   `NonSQLExpression.sql` may return an arbitrary user facing string representation of the expression.

Author: Cheng Lian <lian@databricks.com>

Closes #10757 from liancheng/spark-12799.simplify-expression-string-methods.

d9efe63e

Feb 19, 2016

[SPARK-13339][DOCS] Clarify commutative / associative operator requirements for reduce, fold · fb7e2179

Sean Owen authored 9 years ago

Clarify that reduce functions need to be commutative, and fold functions do not

See https://github.com/apache/spark/pull/11091

Author: Sean Owen <sowen@cloudera.com>

Closes #11217 from srowen/SPARK-13339.

fb7e2179

Feb 11, 2016

[SPARK-13264][DOC] Removed multi-byte characters in spark-env.sh.template · c2f21d88

Sasaki Toru authored 9 years ago

In spark-env.sh.template, there are multi-byte characters, this PR will remove it.

Author: Sasaki Toru <sasakitoa@nttdata.co.jp>

Closes #11149 from sasakitoa/remove_multibyte_in_sparkenv.

c2f21d88

Jan 26, 2016

[SPARK-12903][SPARKR] Add covar_samp and covar_pop for SparkR · e7f9199e

Yanbo Liang authored 9 years ago

Add ```covar_samp``` and ```covar_pop``` for SparkR.
Should we also provide ```cov``` alias for ```covar_samp```? There is ```cov``` implementation at stats.R which masks ```stats::cov``` already, but may bring to breaking API change.

cc sun-rui felixcheung shivaram

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10829 from yanboliang/spark-12903.

e7f9199e

Jan 22, 2016

[SPARK-12629][SPARKR] Fixes for DataFrame saveAsTable method · 8a88e121

Narine Kokhlikyan authored 9 years ago

I've tried to solve some of the issues mentioned in: https://issues.apache.org/jira/browse/SPARK-12629
Please, let me know what do you think.
Thanks!

Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com>

Closes #10580 from NarineK/sparkrSavaAsRable.

8a88e121

Jan 20, 2016

[SPARK-12204][SPARKR] Implement drop method for DataFrame in SparkR. · 1b2a918e
Sun Rui authored 9 years ago
```
Author: Sun Rui <rui.sun@intel.com>

Closes #10201 from sun-rui/SPARK-12204.
```
1b2a918e

[SPARK-12910] Fixes : R version for installing sparkR · d7415991

smishra8 authored 9 years ago

Testing code:
```
$ ./install-dev.sh
USING R_HOME = /usr/bin
ERROR: this R is version 2.15.1, package 'SparkR' requires R >= 3.0
```

Using the new argument:
```
$ ./install-dev.sh /content/username/SOFTWARE/R-3.2.3
USING R_HOME = /content/username/SOFTWARE/R-3.2.3/bin
* installing *source* package â€˜SparkRâ€™ ...
** R
** inst
** preparing package for lazy loading
Creating a new generic function for â€˜colnamesâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜colnames<-â€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜covâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜na.omitâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜filterâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜intersectâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜sampleâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜transformâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜subsetâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜summaryâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜lagâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜rankâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜sdâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜varâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜predictâ€™ in package â€˜SparkRâ€™
Creating a new generic function for â€˜rbindâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜lapplyâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜Filterâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜aliasâ€™ from package â€˜statsâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜substrâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜%in%â€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜meanâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜uniqueâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜nrowâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜ncolâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜headâ€™ from package â€˜utilsâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜factorialâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜atan2â€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
Creating a generic function for â€˜ifelseâ€™ from package â€˜baseâ€™ in package â€˜SparkRâ€™
** help
No man pages found in package  â€˜SparkRâ€™
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (SparkR)

```

Author: Shubhanshu Mishra <smishra8@illinois.edu>

Closes #10836 from napsternxg/master.

d7415991

[SPARK-12848][SQL] Change parsed decimal literal datatype from Double to Decimal · 10173279

Herman van Hovell authored 9 years ago

The current parser turns a decimal literal, for example ```12.1```, into a Double. The problem with this approach is that we convert an exact literal into a non-exact ```Double```. The PR changes this behavior, a Decimal literal is now converted into an extact ```BigDecimal```.

The behavior for scientific decimals, for example ```12.1e01```, is unchanged. This will be converted into a Double.

This PR replaces the ```BigDecimal``` literal by a ```Double``` literal, because the ```BigDecimal``` is the default now. You can use the double literal by appending a 'D' to the value, for instance: ```3.141527D```

cc davies rxin

Author: Herman van Hovell <hvanhovell@questtec.nl>

Closes #10796 from hvanhovell/SPARK-12848.

10173279

Jan 19, 2016

[SPARK-12232][SPARKR] New R API for read.table to avoid name conflict · 488bbb21

felixcheung authored 9 years ago

shivaram sorry it took longer to fix some conflicts, this is the change to add an alias for `table`

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #10406 from felixcheung/readtable.

488bbb21

[SPARK-12337][SPARKR] Implement dropDuplicates() method of DataFrame in SparkR. · 3ac64828
Sun Rui authored 9 years ago
```
Author: Sun Rui <rui.sun@intel.com>

Closes #10309 from sun-rui/SPARK-12337.
```
3ac64828

[SPARK-12168][SPARKR] Add automated tests for conflicted function in R · 37fefa66

felixcheung authored 9 years ago

Currently this is reported when loading the SparkR package in R (probably would add is.nan)
```
Loading required package: methods

Attaching package: ‘SparkR’

The following objects are masked from ‘package:stats’:

    cov, filter, lag, na.omit, predict, sd, var

The following objects are masked from ‘package:base’:

    colnames, colnames<-, intersect, rank, rbind, sample, subset,
    summary, table, transform
```

Adding this test adds an automated way to track changes to masked method.
Also, the second part of this test check for those functions that would not be accessible without namespace/package prefix.

Incidentally, this might point to how we would fix those inaccessible functions in base or stats.
Looking for feedback for adding this test.

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #10171 from felixcheung/rmaskedtest.

37fefa66

Jan 17, 2016

[SPARK-12862][SPARKR] Jenkins does not run R tests · 92502703

felixcheung authored 9 years ago

Slight correction: I'm leaving sparkR as-is (ie. R file not supported) and fixed only run-tests.sh as shivaram described.

I also assume we are going to cover all doc changes in https://issues.apache.org/jira/browse/SPARK-12846 instead of here.

rxin shivaram zjffdu

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #10792 from felixcheung/sparkRcmd.

92502703

Jan 15, 2016

[SPARK-11031][SPARKR] Method str() on a DataFrame · ba4a6419

Oscar D. Lara Yejas authored 9 years ago

Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.usca.ibm.com>
Author: Oscar D. Lara Yejas <olarayej@mail.usf.edu>
Author: Oscar D. Lara Yejas <oscar.lara.yejas@us.ibm.com>
Author: Oscar D. Lara Yejas <odlaraye@oscars-mbp.attlocal.net>

Closes #9613 from olarayej/SPARK-11031.

ba4a6419

Jan 14, 2016

[SPARK-12756][SQL] use hash expression in Exchange · 962e9bcf

Wenchen Fan authored 9 years ago

This PR makes bucketing and exchange share one common hash algorithm, so that we can guarantee the data distribution is same between shuffle and bucketed data source, which enables us to only shuffle one side when join a bucketed table and a normal one.

This PR also fixes the tests that are broken by the new hash behaviour in shuffle.

Author: Wenchen Fan <wenchen@databricks.com>

Closes #10703 from cloud-fan/use-hash-expr-in-shuffle.

962e9bcf

Jan 09, 2016

[SPARK-12645][SPARKR] SparkR support hash function · 3d77cffe

Yanbo Liang authored 9 years ago

Add ```hash``` function for SparkR ```DataFrame```.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10597 from yanboliang/spark-12645.

3d77cffe

Jan 06, 2016

[SPARK-12393][SPARKR] Add read.text and write.text for SparkR · d1fea413

Yanbo Liang authored 9 years ago

Add ```read.text``` and ```write.text``` for SparkR.
cc sun-rui felixcheung shivaram

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10348 from yanboliang/spark-12393.

d1fea413

Jan 05, 2016

[SPARK-12625][SPARKR][SQL] replace R usage of Spark SQL deprecated API · cc4d5229

felixcheung authored 9 years ago

rxin davies shivaram
Took save mode from my PR #10480, and move everything to writer methods. This is related to PR #10559

- [x] it seems jsonRDD() is broken, need to investigate - this is not a public API though; will look into some more tonight. (fixed)

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #10584 from felixcheung/rremovedeprecated.

cc4d5229

Jan 03, 2016
- [SPARK-12327][SPARKR] fix code for lintr warning for commented code · c3d50560
  felixcheung authored 9 years ago
  
  shivaram Author: felixcheung <felixcheung_m@hotmail.com> Closes #10408 from felixcheung/rcodecomment.
  c3d50560
Dec 29, 2015

[SPARK-11199][SPARKR] Improve R context management story and add getOrCreate · f6ecf143

Hossein authored 9 years ago

* Changes api.r.SQLUtils to use ```SQLContext.getOrCreate``` instead of creating a new context.
* Adds a simple test

[SPARK-11199] #comment link with JIRA

Author: Hossein <hossein@databricks.com>

Closes #9185 from falaki/SPARK-11199.

f6ecf143

[SPARK-12526][SPARKR] ifelse`, `when`, `otherwise` unable to take Column as value · d80cc90b

Forest Fang authored 9 years ago

`ifelse`, `when`, `otherwise` is unable to take `Column` typed S4 object as values.

For example:
```r
ifelse(lit(1) == lit(1), lit(2), lit(3))
ifelse(df$mpg > 0, df$mpg, 0)
```
will both fail with
```r
attempt to replicate an object of type 'environment'
```

The PR replaces `ifelse` calls with `if ... else ...` inside the function implementations to avoid attempt to vectorize(i.e. `rep()`). It remains to be discussed whether we should instead support vectorization in these functions for consistency because `ifelse` in base R is vectorized but I cannot foresee any scenarios these functions will want to be vectorized in SparkR.

For reference, added test cases which trigger failures:
```r
. Error: when(), otherwise() and ifelse() with column on a DataFrame ----------
error in evaluating the argument 'x' in selecting a method for function 'collect':
  error in evaluating the argument 'col' in selecting a method for function 'select':
  attempt to replicate an object of type 'environment'
Calls: when -> when -> ifelse -> ifelse

1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls, message = function(c) invokeRestart("muffleMessage"))
2: eval(code, new_test_environment)
3: eval(expr, envir, enclos)
4: expect_equal(collect(select(df, when(df$a > 1 & df$b > 2, lit(1))))[, 1], c(NA, 1)) at test_sparkSQL.R:1126
5: expect_that(object, equals(expected, label = expected.label, ...), info = info, label = label)
6: condition(object)
7: compare(actual, expected, ...)
8: collect(select(df, when(df$a > 1 & df$b > 2, lit(1))))
Error: Test failures
Execution halted
```

Author: Forest Fang <forest.fang@outlook.com>

Closes #10481 from saurfang/spark-12526.

d80cc90b

Dec 19, 2015
- Bump master version to 2.0.0-SNAPSHOT. · f496031b
  Reynold Xin authored 9 years ago
  
  Author: Reynold Xin <rxin@databricks.com> Closes #10387 from rxin/version-bump.
  f496031b
Dec 16, 2015

[SPARK-12310][SPARKR] Add write.json and write.parquet for SparkR · 22f6cd86

Yanbo Liang authored 9 years ago

Add ```write.json``` and ```write.parquet``` for SparkR, and deprecated ```saveAsParquetFile```.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10281 from yanboliang/spark-12310.

22f6cd86

[SPARK-12318][SPARKR] Save mode in SparkR should be error by default · 2eb5af5f
Jeff Zhang authored 9 years ago
```
shivaram  Please help review.

Author: Jeff Zhang <zjffdu@apache.org>

Closes #10290 from zjffdu/SPARK-12318.
```
2eb5af5f

Dec 14, 2015

[SPARK-12327] Disable commented code lintr temporarily · fb3778de

Shivaram Venkataraman authored 9 years ago

cc yhuai felixcheung shaneknapp

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #10300 from shivaram/comment-lintr-disable.

fb3778de

Dec 11, 2015

[SPARK-12158][SPARKR][SQL] Fix 'sample' functions that break R unit test cases · 1e3526c2

gatorsmile authored 9 years ago

The existing sample functions miss the parameter `seed`, however, the corresponding function interface in `generics` has such a parameter. Thus, although the function caller can call the function with the 'seed', we are not using the value.

This could cause SparkR unit tests failed. For example, I hit it in another PR:
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47213/consoleFull

Author: gatorsmile <gatorsmile@gmail.com>

Closes #10160 from gatorsmile/sampleR.

1e3526c2

[SPARK-12146][SPARKR] SparkR jsonFile should support multiple input files · 0fb98255

Yanbo Liang authored 9 years ago

* ```jsonFile``` should support multiple input files, such as:
```R
jsonFile(sqlContext, c(“path1”, “path2”)) # character vector as arguments
jsonFile(sqlContext, “path1,path2”)
```
* Meanwhile, ```jsonFile``` has been deprecated by Spark SQL and will be removed at Spark 2.0. So we mark ```jsonFile``` deprecated and use ```read.json``` at SparkR side.
* Replace all ```jsonFile``` with ```read.json``` at test_sparkSQL.R, but still keep jsonFile test case.
* If this PR is accepted, we should also make almost the same change for ```parquetFile```.

cc felixcheung sun-rui shivaram

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10145 from yanboliang/spark-12146.

0fb98255

Dec 10, 2015

[SPARK-12234][SPARKR] Fix ```subset``` function error when only set ```select``` argument · d9d354ed

Yanbo Liang authored 9 years ago

Fix ```subset``` function error when only set ```select``` argument. Please refer to the [JIRA](https://issues.apache.org/jira/browse/SPARK-12234) about the error and how to reproduce it.

cc sun-rui felixcheung shivaram

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10217 from yanboliang/spark-12234.

d9d354ed

[SPARK-12198][SPARKR] SparkR support read.parquet and deprecate parquetFile · eeb58722

Yanbo Liang authored 9 years ago

SparkR support ```read.parquet``` and deprecate ```parquetFile```. This change is similar with #10145 for ```jsonFile```.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10191 from yanboliang/spark-12198.

eeb58722

Dec 07, 2015

[SPARK-12034][SPARKR] Eliminate warnings in SparkR test cases. · 39d677c8

Sun Rui authored 9 years ago

This PR:
1. Suppress all known warnings.
2. Cleanup test cases and fix some errors in test cases.
3. Fix errors in HiveContext related test cases. These test cases are actually not run previously due to a bug of creating TestHiveContext.
4. Support 'testthat' package version 0.11.0 which prefers that test cases be under 'tests/testthat'
5. Make sure the default Hadoop file system is local when running test cases.
6. Turn on warnings into errors.

Author: Sun Rui <rui.sun@intel.com>

Closes #10030 from sun-rui/SPARK-12034.

39d677c8

Dec 06, 2015

[SPARK-12044][SPARKR] Fix usage of isnan, isNaN · b6e8e63a

Yanbo Liang authored 9 years ago

1, Add ```isNaN``` to ```Column``` for SparkR. ```Column``` should has three related variable functions: ```isNaN, isNull, isNotNull```.
2, Replace ```DataFrame.isNaN``` with ```DataFrame.isnan``` at SparkR side. Because ```DataFrame.isNaN``` has been deprecated and will be removed at Spark 2.0.
<del>3, Add ```isnull``` to ```DataFrame``` for SparkR. ```DataFrame``` should has two related functions: ```isnan, isnull```.<del>

cc shivaram sun-rui felixcheung

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10037 from yanboliang/spark-12044.

b6e8e63a

Dec 05, 2015

[SPARK-12115][SPARKR] Change numPartitions() to getNumPartitions() to be... · 6979edf4

Yanbo Liang authored 9 years ago

[SPARK-12115][SPARKR] Change numPartitions() to getNumPartitions() to be consistent with Scala/Python

Change ```numPartitions()``` to ```getNumPartitions()``` to be consistent with Scala/Python.
<del>Note: If we can not catch up with 1.6 release, it will be breaking change for 1.7 that we also need to explain in release note.<del>

cc sun-rui felixcheung shivaram

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10123 from yanboliang/spark-12115.

6979edf4

[SPARK-11715][SPARKR] Add R support corr for Column Aggregration · 895b6c47

felixcheung authored 9 years ago

Need to match existing method signature

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #9680 from felixcheung/rcorr.

895b6c47

[SPARK-11774][SPARKR] Implement struct(), encode(), decode() functions in SparkR. · c8d0e160
Sun Rui authored 9 years ago
```
Author: Sun Rui <rui.sun@intel.com>

Closes #9804 from sun-rui/SPARK-11774.
```
c8d0e160