Commits · 280afe0ef384433eae2076cda569c5f1b3e49d42 · cs525-sp18-g07 / spark

Feb 17, 2017

[SPARK-19639][SPARKR][EXAMPLE] Add spark.svmLinear example and update vignettes · 8b57ea4a

## What changes were proposed in this pull request?

We recently add the spark.svmLinear API for SparkR. We need to add an example and update the vignettes.

## How was this patch tested?

Manually run example.

Author: wm624@hotmail.com <wm624@hotmail.com>

Closes #16969 from wangmiao1981/example.

8b57ea4a

[SPARK-18285][SPARKR] SparkR approxQuantile supports input multiple columns · b4065983

Yanbo Liang authored 8 years ago

## What changes were proposed in this pull request?
SparkR ```approxQuantile``` supports input multiple columns.

## How was this patch tested?
Unit test.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #16951 from yanboliang/spark-19619.

b4065983

Feb 15, 2017

[SPARK-19399][SPARKR] Add R coalesce API for DataFrame and Column · 671bc08e

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

Add coalesce on DataFrame for down partitioning without shuffle and coalesce on Column

## How was this patch tested?

manual, unit tests

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #16739 from felixcheung/rcoalesce.

671bc08e

[SPARK-19456][SPARKR] Add LinearSVC R API · 3973403d

wm624@hotmail.com authored 8 years ago

## What changes were proposed in this pull request?

Linear SVM classifier is newly added into ML and python API has been added. This JIRA is to add R side API.

Marked as WIP, as I am designing unit tests.

## How was this patch tested?

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: wm624@hotmail.com <wm624@hotmail.com>

Closes #16800 from wangmiao1981/svc.

3973403d

Feb 14, 2017

[SPARK-19387][SPARKR] Tests do not run with SparkR source package in CRAN check · a3626ca3

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

- this is cause by changes in SPARK-18444, SPARK-18643 that we no longer install Spark when `master = ""` (default), but also related to SPARK-18449 since the real `master` value is not known at the time the R code in `sparkR.session` is run. (`master` cannot default to "local" since it could be overridden by spark-submit commandline or spark config)
- as a result, while running SparkR as a package in IDE is working fine, CRAN check is not as it is launching it via non-interactive script
- fix is to add check to the beginning of each test and vignettes; the same would also work by changing `sparkR.session()` to `sparkR.session(master = "local")` in tests, but I think being more explicit is better.

## How was this patch tested?

Tested this by reverting version to 2.1, since it needs to download the release jar with matching version. But since there are changes in 2.2 (specifically around SparkR ML) that are incompatible with 2.1, some tests are failing in this config. Will need to port this to branch-2.1 and retest with 2.1 release jar.

manually as:
```
# modify DESCRIPTION to revert version to 2.1.0
SPARK_HOME=/usr/spark R CMD build pkg
# run cran check without SPARK_HOME
R CMD check --as-cran SparkR_2.1.0.tar.gz
```

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #16720 from felixcheung/rcranchecktest.

a3626ca3

Feb 12, 2017

[SPARK-19342][SPARKR] bug fixed in collect method for collecting timestamp column · bc0a0e63

titicaca authored 8 years ago

## What changes were proposed in this pull request?

Fix a bug in collect method for collecting timestamp column, the bug can be reproduced as shown in the following codes and outputs:

```
library(SparkR)
sparkR.session(master = "local")
df <- data.frame(col1 = c(0, 1, 2),
                 col2 = c(as.POSIXct("2017-01-01 00:00:01"), NA, as.POSIXct("2017-01-01 12:00:01")))

sdf1 <- createDataFrame(df)
print(dtypes(sdf1))
df1 <- collect(sdf1)
print(lapply(df1, class))

sdf2 <- filter(sdf1, "col1 > 0")
print(dtypes(sdf2))
df2 <- collect(sdf2)
print(lapply(df2, class))
```

As we can see from the printed output, the column type of col2 in df2 is converted to numeric unexpectedly, when NA exists at the top of the column.

This is caused by method `do.call(c, list)`, if we convert a list, i.e. `do.call(c, list(NA, as.POSIXct("2017-01-01 12:00:01"))`, the class of the result is numeric instead of POSIXct.

Therefore, we need to cast the data type of the vector explicitly.

## How was this patch tested?

The patch can be tested manually with the same code above.

Author: titicaca <fangzhou.yang@hotmail.com>

Closes #16689 from titicaca/sparkr-dev.

bc0a0e63

Feb 08, 2017

[SPARK-19464][BUILD][HOTFIX] run-tests should use hadoop2.6 · c618ccdb

Dongjoon Hyun authored 8 years ago

## What changes were proposed in this pull request?

After SPARK-19464, **SparkPullRequestBuilder** fails because it still tries to use hadoop2.3.

**BEFORE**
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72595/console
```
========================================================================
Building Spark
========================================================================
[error] Could not find hadoop2.3 in the list. Valid options  are ['hadoop2.6', 'hadoop2.7']
Attempting to post to Github...
 > Post successful.
```

**AFTER**
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72595/console
```
========================================================================
Building Spark
========================================================================
[info] Building Spark (w/Hive 1.2.1) using SBT with these arguments:  -Phadoop-2.6 -Pmesos -Pkinesis-asl -Pyarn -Phive-thriftserver -Phive test:package streaming-kafka-0-8-assembly/assembly streaming-flume-assembly/assembly streaming-kinesis-asl-assembly/assembly
Using /usr/java/jdk1.8.0_60 as default JAVA_HOME.
Note, this will be overridden by -java-home if it is set.
```

## How was this patch tested?

Pass the existing test.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #16858 from dongjoon-hyun/hotfix_run-tests.

Unverified

c618ccdb

Feb 07, 2017

[SPARK-16609] Add to_date/to_timestamp with format functions · 7a7ce272

anabranch authored 8 years ago

## What changes were proposed in this pull request?

This pull request adds two new user facing functions:
- `to_date` which accepts an expression and a format and returns a date.
- `to_timestamp` which accepts an expression and a format and returns a timestamp.

For example, Given a date in format: `2016-21-05`. (YYYY-dd-MM)

### Date Function
*Previously*
```
to_date(unix_timestamp(lit("2016-21-05"), "yyyy-dd-MM").cast("timestamp"))
```
*Current*
```
to_date(lit("2016-21-05"), "yyyy-dd-MM")
```

### Timestamp Function
*Previously*
```
unix_timestamp(lit("2016-21-05"), "yyyy-dd-MM").cast("timestamp")
```
*Current*
```
to_timestamp(lit("2016-21-05"), "yyyy-dd-MM")
```
### Tasks

- [X] Add `to_date` to Scala Functions
- [x] Add `to_date` to Python Functions
- [x] Add `to_date` to SQL Functions
- [X] Add `to_timestamp` to Scala Functions
- [x] Add `to_timestamp` to Python Functions
- [x] Add `to_timestamp` to SQL Functions
- [x] Add function to R

## How was this patch tested?

- [x] Add Functions to `DateFunctionsSuite`
- Test new `ParseToTimestamp` Expression (*not necessary*)
- Test new `ParseToDate` Expression (*not necessary*)
- [x] Add test for R
- [x] Add test for Python in test.py

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: anabranch <wac.chambers@gmail.com>
Author: Bill Chambers <bill@databricks.com>
Author: anabranch <bill@databricks.com>

Closes #16138 from anabranch/SPARK-16609.

7a7ce272

Feb 05, 2017

[SPARK-19452][SPARKR] Fix bug in the name assignment method · b94f4b6f

actuaryzhang authored 8 years ago

## What changes were proposed in this pull request?
The names method fails to check for validity of the assignment values. This can be fixed by calling colnames within names.

## How was this patch tested?
new tests.

Author: actuaryzhang <actuaryzhang10@gmail.com>

Closes #16794 from actuaryzhang/sparkRNames.

b94f4b6f

Feb 03, 2017

[SPARK-19386][SPARKR][FOLLOWUP] fix error in vignettes · 050c20cc

actuaryzhang authored 8 years ago

## What changes were proposed in this pull request?

Current version has error in vignettes:
```
model <- spark.bisectingKmeans(df, Sepal_Length ~ Sepal_Width, k = 4)
summary(kmeansModel)
```

`kmeansModel` does not exist...

felixcheung wangmiao1981

Author: actuaryzhang <actuaryzhang10@gmail.com>

Closes #16799 from actuaryzhang/sparkRVignettes.

050c20cc

[SPARK-19386][SPARKR][DOC] Bisecting k-means in SparkR documentation · 48aafeda

krishnakalyan3 authored 8 years ago

## What changes were proposed in this pull request?
Update programming guide, example and vignette with Bisecting k-means.

Author: krishnakalyan3 <krishnakalyan3@gmail.com>

Closes #16767 from krishnakalyan3/bisecting-kmeans.

48aafeda

Jan 31, 2017

[SPARK-19319][SPARKR] SparkR Kmeans summary returns error when the cluster size doesn't equal to k · 9ac05225

wm624@hotmail.com authored 8 years ago

## What changes were proposed in this pull request

When Kmeans using initMode = "random" and some random seed, it is possible the actual cluster size doesn't equal to the configured `k`.

In this case, summary(model) returns error due to the number of cols of coefficient matrix doesn't equal to k.

Example:
>  col1 <- c(1, 2, 3, 4, 0, 1, 2, 3, 4, 0)
>   col2 <- c(1, 2, 3, 4, 0, 1, 2, 3, 4, 0)
>   col3 <- c(1, 2, 3, 4, 0, 1, 2, 3, 4, 0)
>   cols <- as.data.frame(cbind(col1, col2, col3))
>   df <- createDataFrame(cols)
>
>   model2 <- spark.kmeans(data = df, ~ ., k = 5, maxIter = 10,  initMode = "random", seed = 22222, tol = 1E-5)
>
> summary(model2)
Error in `colnames<-`(`*tmp*`, value = c("col1", "col2", "col3")) :
  length of 'dimnames' [2] not equal to array extent
In addition: Warning message:
In matrix(coefficients, ncol = k) :
  data length [9] is not a sub-multiple or multiple of the number of rows [2]

Fix: Get the actual cluster size in the summary and use it to build the coefficient matrix.
## How was this patch tested?

Add unit tests.

Author: wm624@hotmail.com <wm624@hotmail.com>

Closes #16666 from wangmiao1981/kmeans.

9ac05225

[SPARK-19395][SPARKR] Convert coefficients in summary to matrix · ce112cec

actuaryzhang authored 8 years ago

## What changes were proposed in this pull request?
The `coefficients` component in model summary should be 'matrix' but the underlying structure is indeed list. This affects several models except for 'AFTSurvivalRegressionModel' which has the correct implementation. The fix is to first `unlist` the coefficients returned from the `callJMethod` before converting to matrix. An example illustrates the issues:

```
data(iris)
df <- createDataFrame(iris)
model <- spark.glm(df, Sepal_Length ~ Sepal_Width, family = "gaussian")
s <- summary(model)

> str(s$coefficients)
List of 8
 $ : num 6.53
 $ : num -0.223
 $ : num 0.479
 $ : num 0.155
 $ : num 13.6
 $ : num -1.44
 $ : num 0
 $ : num 0.152
 - attr(*, "dim")= int [1:2] 2 4
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:2] "(Intercept)" "Sepal_Width"
  ..$ : chr [1:4] "Estimate" "Std. Error" "t value" "Pr(>|t|)"
> s$coefficients[, 2]
$`(Intercept)`
[1] 0.4788963

$Sepal_Width
[1] 0.1550809
```

This  shows that the underlying structure of coefficients is still `list`.

felixcheung wangmiao1981

Author: actuaryzhang <actuaryzhang10@gmail.com>

Closes #16730 from actuaryzhang/sparkRCoef.

ce112cec

Jan 30, 2017

[SPARKR][DOCS] update R API doc for subset/extract · be7425e2

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

With extract `[[` or replace `[[<-`, the parameter `i` is a column index, that needs to be corrected in doc. Also a few minor updates: examples, links.

## How was this patch tested?

manual

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #16721 from felixcheung/rsubsetdoc.

be7425e2

Jan 27, 2017

[SPARK-19324][SPARKR] Spark VJM stdout output is getting dropped in SparkR · a7ab6f9a

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

This affects mostly running job from the driver in client mode when results are expected to be through stdout (which should be somewhat rare, but possible)

Before:
```
> a <- as.DataFrame(cars)
> b <- group_by(a, "dist")
> c <- count(b)
> sparkR.callJMethod(c$countjc, "explain", TRUE)
NULL
```

After:
```
> a <- as.DataFrame(cars)
> b <- group_by(a, "dist")
> c <- count(b)
> sparkR.callJMethod(c$countjc, "explain", TRUE)
count#11L
NULL
```

Now, `column.explain()` doesn't seem very useful (we can get more extensive output with `DataFrame.explain()`) but there are other more complex examples with calls of `println` in Scala/JVM side, that are getting dropped.

## How was this patch tested?

manual

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #16670 from felixcheung/rjvmstdout.

a7ab6f9a

[SPARK-19333][SPARKR] Add Apache License headers to R files · 385d7384

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

add header

## How was this patch tested?

Manual run to check vignettes html is created properly

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #16709 from felixcheung/rfilelicense.

385d7384

Jan 26, 2017

[SPARK-18788][SPARKR] Add API for getNumPartitions · 90817a6c

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

With doc to say this would convert DF into RDD

## How was this patch tested?

unit tests, manual tests

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #16668 from felixcheung/rgetnumpartitions.

90817a6c

[SPARK-18821][SPARKR] Bisecting k-means wrapper in SparkR · c0ba2843

wm624@hotmail.com authored 8 years ago

## What changes were proposed in this pull request?

Add R wrapper for bisecting Kmeans.

As JIRA is down, I will update title to link with corresponding JIRA later.

## How was this patch tested?

Add new unit tests.

Author: wm624@hotmail.com <wm624@hotmail.com>

Closes #16566 from wangmiao1981/bk.

c0ba2843

Jan 24, 2017

[SPARK-18823][SPARKR] add support for assigning to column · f27e0247

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

Support for
```
df[[myname]] <- 1
df[[2]] <- df$eruptions
```

## How was this patch tested?

manual tests, unit tests

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #16663 from felixcheung/rcolset.

f27e0247

Jan 21, 2017

[SPARK-19291][SPARKR][ML] spark.gaussianMixture supports output log-likelihood. · 0c589e37

Yanbo Liang authored 8 years ago

## What changes were proposed in this pull request?
```spark.gaussianMixture``` supports output total log-likelihood for the model like R ```mvnormalmixEM```.

## How was this patch tested?
R unit test.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #16646 from yanboliang/spark-19291.

0c589e37

Jan 18, 2017

[SPARK-19231][SPARKR] add error handling for download and untar for Spark release · 278fa1eb

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

When R is starting as a package and it needs to download the Spark release distribution we need to handle error for download and untar, and clean up, otherwise it will get stuck.

## How was this patch tested?

manually

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #16589 from felixcheung/rtarreturncode.

278fa1eb

Jan 16, 2017

[SPARK-18828][SPARKR] Refactor scripts for R · c84f7d3e

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

Refactored script to remove duplications and clearer purpose for each script

## How was this patch tested?

manually

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #16249 from felixcheung/rscripts.

c84f7d3e

[SPARK-19232][SPARKR] Update Spark distribution download cache location on Windows · a115a543

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

Windows seems to be the only place with appauthor in the path, for which we should say "Apache" (and case sensitive)
Current path of `AppData\Local\spark\spark\Cache` is a bit odd.

## How was this patch tested?

manual.

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #16590 from felixcheung/rcachedir.

a115a543

[SPARK-19066][SPARKR] SparkR LDA doesn't set optimizer correctly · 12c8c216

wm624@hotmail.com authored 8 years ago

## What changes were proposed in this pull request?

spark.lda passes the optimizer "em" or "online" as a string to the backend. However, LDAWrapper doesn't set optimizer based on the value from R. Therefore, for optimizer "em", the `isDistributed` field is FALSE, which should be TRUE based on scala code.

In addition, the `summary` method should bring back the results related to `DistributedLDAModel`.

## How was this patch tested?
Manual tests by comparing with scala example.
Modified the current unit test: fix the incorrect unit test and add necessary tests for `summary` method.

Author: wm624@hotmail.com <wm624@hotmail.com>

Closes #16464 from wangmiao1981/new.

12c8c216

Jan 13, 2017

[SPARK-18335][SPARKR] createDataFrame to support numPartitions parameter · b0e8eb6d

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

To allow specifying number of partitions when the DataFrame is created

## How was this patch tested?

manual, unit tests

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #16512 from felixcheung/rnumpart.

b0e8eb6d

[SPARK-19142][SPARKR] spark.kmeans should take seed, initSteps, and tol as parameters · 7f24a0b6

wm624@hotmail.com authored 8 years ago

## What changes were proposed in this pull request?
spark.kmeans doesn't have interface to set initSteps, seed and tol. As Spark Kmeans algorithm doesn't take the same set of parameters as R kmeans, we should maintain a different interface in spark.kmeans.

Add missing parameters and corresponding document.

Modified existing unit tests to take additional parameters.

Author: wm624@hotmail.com <wm624@hotmail.com>

Closes #16523 from wangmiao1981/kmeans.

7f24a0b6

Jan 11, 2017

[SPARK-19130][SPARKR] Support setting literal value as column implicitly · d749c066

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

```
df$foo <- 1
```

instead of
```
df$foo <- lit(1)
```

## How was this patch tested?

unit tests

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #16510 from felixcheung/rlitcol.

d749c066

Jan 10, 2017

[SPARK-19133][SPARKR][ML] fix glm for Gamma, clarify glm family supported · 9bc3507e

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

R family is a longer list than what Spark supports.

## How was this patch tested?

manual

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #16511 from felixcheung/rdocglmfamily.

9bc3507e

Jan 08, 2017

[SPARK-19126][DOCS] Update Join Documentation Across Languages · 19d9d4c8

anabranch authored 8 years ago

## What changes were proposed in this pull request?

- [X] Make sure all join types are clearly mentioned
- [X] Make join labeling/style consistent
- [X] Make join label ordering docs the same
- [X] Improve join documentation according to above for Scala
- [X] Improve join documentation according to above for Python
- [X] Improve join documentation according to above for R

## How was this patch tested?
No tests b/c docs.

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: anabranch <wac.chambers@gmail.com>

Closes #16504 from anabranch/SPARK-19126.

19d9d4c8

[SPARK-19127][DOCS] Update Rank Function Documentation · 1f6ded64

anabranch authored 8 years ago

## What changes were proposed in this pull request?

- [X] Fix inconsistencies in function reference for dense rank and dense
- [X] Make all languages equivalent in their reference to `dense_rank` and `rank`.

## How was this patch tested?

N/A for docs.

Please review http://spark.apache.org/contributing.html before opening a pull request.

Author: anabranch <wac.chambers@gmail.com>

Closes #16505 from anabranch/SPARK-19127.

1f6ded64

[SPARK-18862][SPARKR][ML] Split SparkR mllib.R into multiple files · 6b6b555a

Yanbo Liang authored 8 years ago

## What changes were proposed in this pull request?
SparkR ```mllib.R``` is getting bigger as we add more ML wrappers, I'd like to split it into multiple files to make us easy to maintain:
* mllib_classification.R
* mllib_clustering.R
* mllib_recommendation.R
* mllib_regression.R
* mllib_stat.R
* mllib_tree.R
* mllib_utils.R

Note: Only reorg, no actual code change.

## How was this patch tested?
Existing tests.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #16312 from yanboliang/spark-18862.

6b6b555a

Jan 07, 2017

[MINOR] Bump R version to 2.2.0. · cdda3372

Yanbo Liang authored 8 years ago

## What changes were proposed in this pull request?
#16126 bumps master branch version to 2.2.0-SNAPSHOT, but it seems R version was omitted.

## How was this patch tested?
N/A

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #16488 from yanboliang/r-version.

Unverified

cdda3372

Dec 22, 2016

[SPARK-18958][SPARKR] R API toJSON on DataFrame · 17579bda

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

It would make it easier to integrate with other component expecting row-based JSON format.
This replaces the non-public toJSON RDD API.

## How was this patch tested?

manual, unit tests

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #16368 from felixcheung/rJSON.

17579bda

Dec 21, 2016

[SPARK-18903][SPARKR] Add API to get SparkUI URL · 7e8994ff

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

API for SparkUI URL from SparkContext

## How was this patch tested?

manual, unit tests

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #16367 from felixcheung/rwebui.

7e8994ff

Dec 17, 2016

[SPARK-18849][ML][SPARKR][DOC] vignettes final check reorg · 38fd163d

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

Reorganizing content (copy/paste)

## How was this patch tested?

https://felixcheung.github.io/sparkr-vignettes.html

Previous:
https://felixcheung.github.io/sparkr-vignettes_old.html

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #16301 from felixcheung/rvignettespass2.

38fd163d

Dec 16, 2016

[SPARK-18897][SPARKR] Fix SparkR SQL Test to drop test table · 1169db44

Dongjoon Hyun authored 8 years ago

## What changes were proposed in this pull request?

SparkR tests, `R/run-tests.sh`, succeeds only once because `test_sparkSQL.R` does not clean up the test table, `people`.

As a result, the rows in `people` table are accumulated at every run and the test cases fail.

The following is the failure result for the second run.

```r
Failed -------------------------------------------------------------------------
1. Failure: create DataFrame from RDD (test_sparkSQL.R#204) -------------------
collect(sql("SELECT age from people WHERE name = 'Bob'"))$age not equal to c(16).
Lengths differ: 2 vs 1

2. Failure: create DataFrame from RDD (test_sparkSQL.R#206) -------------------
collect(sql("SELECT height from people WHERE name ='Bob'"))$height not equal to c(176.5).
Lengths differ: 2 vs 1
```

## How was this patch tested?

Manual. Run `run-tests.sh` twice and check if it passes without failures.

Author: Dongjoon Hyun <dongjoon@apache.org>

Closes #16310 from dongjoon-hyun/SPARK-18897.

1169db44

Dec 14, 2016

[SPARK-18849][ML][SPARKR][DOC] vignettes final check update · 7d858bc5

Felix Cheung authored 8 years ago

## What changes were proposed in this pull request?

doc cleanup

## How was this patch tested?

~~vignettes is not building for me. I'm going to kick off a full clean build and try again and attach output here for review.~~
Output html here: https://felixcheung.github.io/sparkr-vignettes.html

Author: Felix Cheung <felixcheung_m@hotmail.com>

Closes #16286 from felixcheung/rvignettespass.

7d858bc5

[SPARK-18865][SPARKR] SparkR vignettes MLP and LDA updates · 32438853

wm624@hotmail.com authored 8 years ago

## What changes were proposed in this pull request?

When do the QA work, I found that the following issues:

1). `spark.mlp` doesn't include an example;
2). `spark.mlp` and `spark.lda` have redundant parameter explanations;
3). `spark.lda` document misses default values for some parameters.

I also changed the `spark.logit` regParam in the examples, as we discussed in #16222.

## How was this patch tested?

Manual test

Author: wm624@hotmail.com <wm624@hotmail.com>

Closes #16284 from wangmiao1981/ks.

32438853

[SPARK-18795][ML][SPARKR][DOC] Added KSTest section to SparkR vignettes · 78627425

Joseph K. Bradley authored 8 years ago

## What changes were proposed in this pull request?

Added short section for KSTest.
Also added logreg model to list of ML models in vignette.  (This will be reorganized under SPARK-18849)

![screen shot 2016-12-14 at 1 37 31 pm](https://cloud.githubusercontent.com/assets/5084283/21202140/7f24e240-c202-11e6-9362-458208bb9159.png)

## How was this patch tested?

Manually tested example locally.
Built vignettes locally.

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #16283 from jkbradley/ksTest-vignette.

78627425

Dec 13, 2016

[MINOR][SPARKR] fix kstest example error and add unit test · f2ddabfa

wm624@hotmail.com authored 8 years ago

## What changes were proposed in this pull request?

While adding vignettes for kstest, I found some errors in the example:
1. There is a typo of kstest;
2. print.summary.KStest doesn't work with the example;

Fix the example errors;
Add a new unit test for print.summary.KStest;

## How was this patch tested?
Manual test;
Add new unit test;

Author: wm624@hotmail.com <wm624@hotmail.com>

Closes #16259 from wangmiao1981/ks.

f2ddabfa