Commits · 40a10d7675578f8370d07e23810d9fc5d58e0550 · cs525-sp18-g07 / spark

Oct 21, 2015

[SPARK-11197][SQL] run SQL on files directly · f8c6bec6

Davies Liu authored 9 years ago

This PR introduce a new feature to run SQL directly on files without create a table, for example:

```
select id from json.`path/to/json/files` as j
```

Author: Davies Liu <davies@databricks.com>

Closes #9173 from davies/source.

f8c6bec6

Oct 20, 2015

[SPARK-11221][SPARKR] fix R doc for lit and add examples · 1107bd95

felixcheung authored 9 years ago

Currently the documentation for `lit` is inconsistent with doc format, references "Scala symbol" and has no example. Fixing that.
shivaram

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #9187 from felixcheung/rlit.

1107bd95

Oct 19, 2015

[SPARK-10668] [ML] Use WeightedLeastSquares in LinearRegression with L… · 4c33a34b

lewuathe authored 9 years ago

…2 regularization if the number of features is small

Author: lewuathe <lewuathe@me.com>
Author: Lewuathe <sasaki@treasure-data.com>
Author: Kai Sasaki <sasaki@treasure-data.com>
Author: Lewuathe <lewuathe@me.com>

Closes #8884 from Lewuathe/SPARK-10668.

4c33a34b

Oct 14, 2015

[SPARK-10996] [SPARKR] Implement sampleBy() in DataFrameStatFunctions. · 390b22fa
Sun Rui authored 9 years ago
```
Author: Sun Rui <rui.sun@intel.com>

Closes #9023 from sun-rui/SPARK-10996.
```
390b22fa

[SPARK-10981] [SPARKR] SparkR Join improvements · 8b328857

Monica Liu authored 9 years ago

I was having issues with collect() and orderBy() in Spark 1.5.0 so I used the DataFrame.R file and test_sparkSQL.R file from the Spark 1.5.1 download. I only modified the join() function in DataFrame.R to include "full", "fullouter", "left", "right", and "leftsemi" and added corresponding test cases in the test for join() and merge() in test_sparkSQL.R file.
Pull request because I filed this JIRA bug report:
https://issues.apache.org/jira/browse/SPARK-10981

Author: Monica Liu <liu.monica.f@gmail.com>

Closes #9029 from mfliu/master.

8b328857

Oct 13, 2015

[SPARK-10913] [SPARKR] attach() function support · f7f28ee7

Adrian Zhuang authored 9 years ago

Bring the change code up to date.

Author: Adrian Zhuang <adrian555@users.noreply.github.com>
Author: adrian555 <wzhuang@us.ibm.com>

Closes #9031 from adrian555/attach2.

f7f28ee7

[SPARK-10888] [SPARKR] Added as.DataFrame as a synonym to createDataFrame · 1e0aba90

Narine Kokhlikyan authored 9 years ago

as.DataFrame is more a R-style like signature.
Also, I'd like to know if we could make the context, e.g. sqlContext global, so that we do not have to specify it as an argument, when we each time create a dataframe.

Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com>

Closes #8952 from NarineK/sparkrasDataFrame.

1e0aba90

[SPARK-10051] [SPARKR] Support collecting data of StructType in DataFrame · 5e3868ba

Sun Rui authored 9 years ago

Two points in this PR:

1. Originally thought was that a named R list is assumed to be a struct in SerDe. But this is problematic because some R functions will implicitly generate named lists that are not intended to be a struct when transferred by SerDe. So SerDe clients have to explicitly mark a names list as struct by changing its class from "list" to "struct".

2. SerDe is in the Spark Core module, and data of StructType is represented as GenricRow which is defined in Spark SQL module. SerDe can't import GenricRow as in maven build Spark SQL module depends on Spark Core module. So this PR adds a registration hook in SerDe to allow SQLUtils in Spark SQL module to register its functions for serialization and deserialization of StructType.

Author: Sun Rui <rui.sun@intel.com>

Closes #8794 from sun-rui/SPARK-10051.

5e3868ba

Oct 10, 2015

[SPARK-10079] [SPARKR] Make 'column' and 'col' functions be S4 functions. · 864de3bf

Sun Rui authored 9 years ago

1.  Add a "col" function into DataFrame.
2.  Move the current "col" function in Column.R to functions.R, convert it to S4 function.
3.  Add a s4 "column" function in functions.R.
4.  Convert the "column" function in Column.R to S4 function. This is for private use.

Author: Sun Rui <rui.sun@intel.com>

Closes #8864 from sun-rui/SPARK-10079.

864de3bf

Oct 09, 2015

[SPARK-10905] [SPARKR] Export freqItems() for DataFrameStatFunctions · 70f44ad2

Rerngvit Yanggratoke authored 9 years ago

[SPARK-10905][SparkR]: Export freqItems() for DataFrameStatFunctions
- Add function (together with roxygen2 doc) to DataFrame.R and generics.R
- Expose the function in NAMESPACE
- Add unit test for the function

Author: Rerngvit Yanggratoke <rerngvit@kth.se>

Closes #8962 from rerngvit/SPARK-10905.

70f44ad2

Oct 08, 2015

[SPARK-10836] [SPARKR] Added sort(x, decreasing, col, ... ) method to DataFrame · e8f90d9d

Narine Kokhlikyan authored 9 years ago

the sort function can be used as an alternative to arrange(... ).
As arguments it accepts x - dataframe, decreasing - TRUE/FALSE, a list of orderings for columns and the list of columns, represented as string names

for example:
sort(df, TRUE, "col1","col2","col3","col5") # for example, if we want to sort some of the columns in the same order

sort(df, decreasing=TRUE, "col1")
sort(df, decreasing=c(TRUE,FALSE), "col1","col2")

Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com>

Closes #8920 from NarineK/sparkrsort.

e8f90d9d

Oct 07, 2015
- [SPARK-10752] [SPARKR] Implement corr() and cov in DataFrameStatFunctions. · f57c63d4
  Sun Rui authored 9 years ago
  
  Author: Sun Rui <rui.sun@intel.com> Closes #8869 from sun-rui/SPARK-10752.
  f57c63d4
Oct 04, 2015

[SPARK-10904] [SPARKR] Fix to support `select(df, c("col1", "col2"))` · 721e8b5f

felixcheung authored 9 years ago

The fix is to coerce `c("a", "b")` into a list such that it could be serialized to call JVM with.

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #8961 from felixcheung/rselect.

721e8b5f

Sep 30, 2015

[SPARK-10807] [SPARKR] Added as.data.frame as a synonym for collect · f21e2da0

Oscar D. Lara Yejas authored 9 years ago

Created method as.data.frame as a synonym for collect().

Author: Oscar D. Lara Yejas <olarayej@mail.usf.edu>
Author: olarayej <oscar.lara.yejas@us.ibm.com>
Author: Oscar D. Lara Yejas <oscar.lara.yejas@us.ibm.com>

Closes #8908 from olarayej/SPARK-10807.

f21e2da0

Sep 25, 2015

[SPARK-10760] [SPARKR] SparkR glm: the documentation in examples - family argument is missing · 6fcee906

Narine Kokhlikyan authored 9 years ago

Hi everyone,

Since the family argument is required for the glm function, the execution of:

model <- glm(Sepal_Length ~ Sepal_Width, df)

is failing.

I've fixed the documentation by adding the family argument and also added the summay(model) which will show the coefficients for the model.

Thanks,
Narine

Author: Narine Kokhlikyan <narine.kokhlikyan@gmail.com>

Closes #8870 from NarineK/sparkrml.

6fcee906

[SPARK-9681] [ML] Support R feature interactions in RFormula · 92233881

Eric Liang authored 9 years ago

This integrates the Interaction feature transformer with SparkR R formula support (i.e. support `:`).

To generate reasonable ML attribute names for feature interactions, it was necessary to add the ability to read attribute the original attribute names back from `StructField`, and also to specify custom group prefixes in `VectorAssembler`. This also has the side-benefit of cleaning up the double-underscores in the attributes generated for non-interaction terms.

mengxr

Author: Eric Liang <ekl@databricks.com>

Closes #8830 from ericl/interaction-2.

92233881

Sep 16, 2015

[SPARK-10050] [SPARKR] Support collecting data of MapType in DataFrame. · 896edb51

Sun Rui authored 9 years ago

1. Support collecting data of MapType from DataFrame.
2. Support data of MapType in createDataFrame.

Author: Sun Rui <rui.sun@intel.com>

Closes #8711 from sun-rui/SPARK-10050.

896edb51

Sep 15, 2015
- Update version to 1.6.0-SNAPSHOT. · 09b7e7c1
  Reynold Xin authored 9 years ago
  
  Author: Reynold Xin <rxin@databricks.com> Closes #8350 from rxin/1.6.
  09b7e7c1
Sep 12, 2015

[SPARK-6548] Adding stddev to DataFrame functions · f4a22808

JihongMa authored 9 years ago

Adding STDDEV support for DataFrame using 1-pass online /parallel algorithm to compute variance. Please review the code change.

Author: JihongMa <linlin200605@gmail.com>
Author: Jihong MA <linlin200605@gmail.com>
Author: Jihong MA <jihongma@jihongs-mbp.usca.ibm.com>
Author: Jihong MA <jihongma@Jihongs-MacBook-Pro.local>

Closes #6297 from JihongMA/SPARK-SQL.

f4a22808

Sep 10, 2015

[SPARK-10049] [SPARKR] Support collecting data of ArraryType in DataFrame. · 45e3be5c

Sun Rui authored 9 years ago

this PR :
1. Enhance reflection in RBackend. Automatically matching a Java array to Scala Seq when finding methods. Util functions like seq(), listToSeq() in R side can be removed, as they will conflict with the Serde logic that transferrs a Scala seq to R side.

2. Enhance the SerDe to support transferring a Scala seq to R side. Data of ArrayType in DataFrame
after collection is observed to be of Scala Seq type.

3. Support ArrayType in createDataFrame().

Author: Sun Rui <rui.sun@intel.com>

Closes #8458 from sun-rui/SPARK-10049.

45e3be5c

Sep 04, 2015

[MINOR] Minor style fix in SparkR · 143e521d

Shivaram Venkataraman authored 9 years ago

`dev/lintr-r` passes on my machine now

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #8601 from shivaram/sparkr-style-fix.

143e521d

Sep 03, 2015

[SPARK-8951] [SPARKR] support Unicode characters in collect() · af0e3125

CHOIJAEHONG authored 9 years ago

Spark gives an error message and does not show the output when a field of the result DataFrame contains characters in CJK.
I changed SerDe.scala in order that Spark support Unicode characters when writes a string to R.

Author: CHOIJAEHONG <redrock07@naver.com>

Closes #7494 from CHOIJAEHONG1/SPARK-8951.

af0e3125

Aug 28, 2015

[SPARK-9803] [SPARKR] Add subset and transform + tests · 2a4e00ca

felixcheung authored 9 years ago

Add subset and transform
Also reorganize `[` & `[[` to subset instead of select

Note: for transform, transform is very similar to mutate. Spark doesn't seem to replace existing column with the name in mutate (ie. `mutate(df, age = df$age + 2)` - returned DataFrame has 2 columns with the same name 'age'), so therefore not doing that for now in transform.
Though it is clearly stated it should replace column with matching name (should I open a JIRA for mutate/transform?)

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #8503 from felixcheung/rsubset_transform.

2a4e00ca

[SPARK-8952] [SPARKR] - Wrap normalizePath calls with suppressWarnings · 499e8e15

Luciano Resende authored 9 years ago

This is based on davies comment on SPARK-8952 which suggests to only call normalizePath() when path starts with '~'

Author: Luciano Resende <lresende@apache.org>

Closes #8343 from lresende/SPARK-8952.

499e8e15

[SPARK-10328] [SPARKR] Fix generic for na.omit · 2f99c372

Shivaram Venkataraman authored 9 years ago

S3 function is at https://stat.ethz.ch/R-manual/R-patched/library/stats/html/na.fail.html

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>
Author: Shivaram Venkataraman <shivaram.venkataraman@gmail.com>
Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #8495 from shivaram/na-omit-fix.

2f99c372

Aug 27, 2015

[SPARK-10219] [SPARKR] Fix varargsToEnv and add test case · e936cf80

Shivaram Venkataraman authored 9 years ago

cc sun-rui davies

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #8475 from shivaram/varargs-fix.

e936cf80

Aug 26, 2015

[MINOR] [SPARKR] Fix some validation problems in SparkR · 773ca037

Yu ISHIKAWA authored 9 years ago

Getting rid of some validation problems in SparkR
https://github.com/apache/spark/pull/7883

cc shivaram

```
inst/tests/test_Serde.R:26:1: style: Trailing whitespace is superfluous.

^~
inst/tests/test_Serde.R:34:1: style: Trailing whitespace is superfluous.

^~
inst/tests/test_Serde.R:37:38: style: Trailing whitespace is superfluous.
  expect_equal(class(x), "character")
                                     ^~
inst/tests/test_Serde.R:50:1: style: Trailing whitespace is superfluous.

^~
inst/tests/test_Serde.R:55:1: style: Trailing whitespace is superfluous.

^~
inst/tests/test_Serde.R:60:1: style: Trailing whitespace is superfluous.

^~
inst/tests/test_sparkSQL.R:611:1: style: Trailing whitespace is superfluous.

^~
R/DataFrame.R:664:1: style: Trailing whitespace is superfluous.

^~~~~~~~~~~~~~
R/DataFrame.R:670:55: style: Trailing whitespace is superfluous.
                df <- data.frame(row.names = 1 : nrow)
                                                      ^~~~~~~~~~~~~~~~
R/DataFrame.R:672:1: style: Trailing whitespace is superfluous.

^~~~~~~~~~~~~~
R/DataFrame.R:686:49: style: Trailing whitespace is superfluous.
                    df[[names[colIndex]]] <- vec
                                                ^~~~~~~~~~~~~~~~~~
```

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #8474 from yu-iskw/minor-fix-sparkr.

773ca037

[SPARK-10308] [SPARKR] Add %in% to the exported namespace · ad7f0f16

Shivaram Venkataraman authored 9 years ago

I also checked all the other functions defined in column.R, functions.R and DataFrame.R and everything else looked fine.

cc yu-iskw

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #8473 from shivaram/in-namespace.

ad7f0f16

[SPARK-9316] [SPARKR] Add support for filtering using `[` (synonym for filter / select) · 75d4773a

felixcheung authored 9 years ago

Add support for
```
   df[df$name == "Smith", c(1,2)]
   df[df$age %in% c(19, 30), 1:2]
```

shivaram

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #8394 from felixcheung/rsubset.

75d4773a

Aug 25, 2015

[SPARK-10048] [SPARKR] Support arbitrary nested Java array in serde. · 71a138cd

Sun Rui authored 9 years ago

This PR:
1. supports transferring arbitrary nested array from JVM to R side in SerDe;
2. based on 1, collect() implemenation is improved. Now it can support collecting data of complex types
   from a DataFrame.

Author: Sun Rui <rui.sun@intel.com>

Closes #8276 from sun-rui/SPARK-10048.

71a138cd

[SPARK-10214] [SPARKR] [DOCS] Improve SparkR Column, DataFrame API docs · d4549fe5

Yu ISHIKAWA authored 9 years ago

cc: shivaram

## Summary

- Add name tags to each methods in DataFrame.R and column.R
- Replace `rdname column` with `rdname {each_func}`. i.e. alias method : `rdname column` =>  `rdname alias`

## Generated PDF File
https://drive.google.com/file/d/0B9biIZIU47lLNHN2aFpnQXlSeGs/view?usp=sharing

## JIRA
[[SPARK-10214] Improve SparkR Column, DataFrame API docs - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-10214)

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #8414 from yu-iskw/SPARK-10214.

d4549fe5

Aug 24, 2015

[SPARK-10118] [SPARKR] [DOCS] Improve SparkR API docs for 1.5 release · 6511bf55

Yu ISHIKAWA authored 9 years ago

cc: shivaram

## Summary

- Modify `tdname` of expression functions. i.e. `ascii`: `rdname functions` => `rdname ascii`
- Replace the dynamical function definitions to the static ones because of thir documentations.

## Generated PDF File
https://drive.google.com/file/d/0B9biIZIU47lLX2t6ZjRoRnBTSEU/view?usp=sharing

## JIRA
[[SPARK-10118] Improve SparkR API docs for 1.5 release - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-10118)

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>
Author: Yuu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #8386 from yu-iskw/SPARK-10118.

6511bf55

Aug 19, 2015

[SPARK-10106] [SPARKR] Add `ifelse` Column function to SparkR · d898c33f

Yu ISHIKAWA authored 9 years ago

### JIRA
[[SPARK-10106] Add `ifelse` Column function to SparkR - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-10106)

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #8303 from yu-iskw/SPARK-10106.

d898c33f

[SPARK-9856] [SPARKR] Add expression functions into SparkR whose params are complicated · 2fcb9cb9

Yu ISHIKAWA authored 9 years ago

I added lots of Column functinos into SparkR. And I also added `rand(seed: Int)` and `randn(seed: Int)` in Scala. Since we need such APIs for R integer type.

### JIRA
[[SPARK-9856] Add expression functions into SparkR whose params are complicated - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9856)

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #8264 from yu-iskw/SPARK-9856-3.

2fcb9cb9

Aug 18, 2015

[SPARK-10075] [SPARKR] Add `when` expressino function in SparkR · bf32c1f7

Yu ISHIKAWA authored 9 years ago

- Add `when` and `otherwise` as `Column` methods
- Add `When` as an expression function
- Add `%otherwise%` infix as an alias of `otherwise`

Since R doesn't support a feature like method chaining, `otherwise(when(condition, value), value)` style is a little annoying for me. If `%otherwise%` looks strange for shivaram, I can remove it. What do you think?

### JIRA
[[SPARK-10075] Add `when` expressino function in SparkR - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-10075)

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #8266 from yu-iskw/SPARK-10075.

bf32c1f7

[SPARKR] [MINOR] Get rid of a long line warning · b4b35f13

Yu ISHIKAWA authored 9 years ago

```
R/functions.R:74:1: style: lines should not be more than 100 characters.
            jc <- callJStatic("org.apache.spark.sql.functions", "lit", ifelse(class(x) == "Column", xjc, x))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #8297 from yu-iskw/minor-lint-r.

b4b35f13

Bump SparkR version string to 1.5.0 · 04e0fea7

Hossein authored 9 years ago

This patch is against master, but we need to apply it to 1.5 branch as well.

cc shivaram  and rxin

Author: Hossein <hossein@databricks.com>

Closes #8291 from falaki/SparkRVersion1.5.

04e0fea7

[SPARK-10007] [SPARKR] Update `NAMESPACE` file in SparkR for simple parameters functions · 1968276a

Yuu ISHIKAWA authored 9 years ago

### JIRA
[[SPARK-10007] Update `NAMESPACE` file in SparkR for simple parameters functions - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-10007)

Author: Yuu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #8277 from yu-iskw/SPARK-10007.

1968276a

Aug 17, 2015

[SPARK-9871] [SPARKR] Add expression functions into SparkR which have a variable parameter · 26e76058

Yu ISHIKAWA authored 9 years ago

### Summary

- Add `lit` function
- Add `concat`, `greatest`, `least` functions

I think we need to improve `collect` function in order to implement `struct` function. Since `collect` doesn't work with arguments which includes a nested `list` variable. It seems that a list against `struct` still has `jobj` classes. So it would be better to solve this problem on another issue.

### JIRA
[[SPARK-9871] Add expression functions into SparkR which have a variable parameter - ASF JIRA](https://issues.apache.org/jira/browse/SPARK-9871)

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #8194 from yu-iskw/SPARK-9856.

26e76058

Aug 16, 2015

[SPARK-8844] [SPARKR] head/collect is broken in SparkR. · 5f9ce738

Sun Rui authored 9 years ago

This is a WIP patch for SPARK-8844  for collecting reviews.

This bug is about reading an empty DataFrame. in readCol(),
      lapply(1:numRows, function(x) {
does not take into consideration the case where numRows = 0.

Will add unit test case.

Author: Sun Rui <rui.sun@intel.com>

Closes #7419 from sun-rui/SPARK-8844.

5f9ce738