Commits · dc86a227e4fc8a9d8c3e8c68da8dff9298447fd0 · cs525-sp18-g07 / spark

Aug 27, 2015

[SPARK-9148] [SPARK-10252] [SQL] Update SQL Programming Guide · dc86a227
Michael Armbrust authored 9 years ago
```
Author: Michael Armbrust <michael@databricks.com>

Closes #8441 from marmbrus/documentation.
```
dc86a227

[SPARK-10182] [MLLIB] GeneralizedLinearModel doesn't unpersist cached data · fdd466be

`GeneralizedLinearModel` creates a cached RDD when building a model. It's inconvenient, since these RDDs flood the memory when building several models in a row, so useful data might get evicted from the cache.

The proposed solution is to always cache the dataset & remove the warning. There's a caveat though: input dataset gets evaluated twice, in line 270 when fitting `StandardScaler` for the first time, and when running optimizer for the second time. So, it might worth to return removed warning.

Another possible solution is to disable caching entirely & return removed warning. I don't really know what approach is better.

Author: Vyacheslav Baranov <slavik.baranov@gmail.com>

Closes #8395 from SlavikBaranov/SPARK-10182.

fdd466be

[SPARK-10257] [MLLIB] Removes Guava from all spark.mllib Java tests · e1f4de4a

Feynman Liang authored 9 years ago

* Replaces instances of `Lists.newArrayList` with `Arrays.asList`
* Replaces `commons.lang.StringUtils` over `com.google.collections.Strings`
* Replaces `List` interface over `ArrayList` implementations

This PR along with #8445 #8446 #8447 completely removes all `com.google.collections.Lists` dependencies within mllib's Java tests.

Author: Feynman Liang <fliang@databricks.com>

Closes #8451 from feynmanliang/SPARK-10257.

e1f4de4a

[SPARK-9613] [HOTFIX] Fix usage of JavaConverters removed in Scala 2.11 · b02e8187

Jacek Laskowski authored 9 years ago

Fix for [JavaConverters.asJavaListConverter](http://www.scala-lang.org/api/2.10.5/index.html#scala.collection.JavaConverters$) being removed in 2.11.7 and hence the build fails with the 2.11 profile enabled. Tested with the default 2.10 and 2.11 profiles. BUILD SUCCESS in both cases.

Build for 2.10:

    ./build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.7.1 -DskipTests clean install

and 2.11:

    ./dev/change-scala-version.sh 2.11
    ./build/mvn -Pyarn -Phadoop-2.6 -Dhadoop.version=2.7.1 -Dscala-2.11 -DskipTests clean install

Author: Jacek Laskowski <jacek@japila.pl>

Closes #8479 from jaceklaskowski/SPARK-9613-hotfix.

b02e8187

[SPARK-10256] [ML] Removes guava dependency from spark.ml.classification JavaTests · 1a446f75
Feynman Liang authored 9 years ago
```
Author: Feynman Liang <fliang@databricks.com>

Closes #8447 from feynmanliang/SPARK-10256.
```
1a446f75
[SPARK-10255] [ML] Removes Guava dependencies from spark.ml.param JavaTests · 75d62307
Feynman Liang authored 9 years ago
```
Author: Feynman Liang <fliang@databricks.com>

Closes #8446 from feynmanliang/SPARK-10255.
```
75d62307

[SPARK-10254] [ML] Removes Guava dependencies in spark.ml.feature JavaTests · 1650f6f5

Feynman Liang authored 9 years ago

* Replaces `com.google.common` dependencies with `java.util.Arrays`
* Small clean up in `JavaNormalizerSuite`

Author: Feynman Liang <fliang@databricks.com>

Closes #8445 from feynmanliang/SPARK-10254.

1650f6f5

[DOCS] [STREAMING] [KAFKA] Fix typo in exactly once semantics · 9625d13d

Moussa Taifi authored 9 years ago

Fix Typo in exactly once semantics
[Semantics of output operations] link

Author: Moussa Taifi <moutai10@gmail.com>

Closes #8468 from moutai/patch-3.

9625d13d

[SPARK-10251] [CORE] some common types are not registered for Kryo Serializat… · de027828
Ram Sriharsha authored 9 years ago
```
…ion by default

Author: Ram Sriharsha <rsriharsha@hw11853.local>

Closes #8465 from harsha2010/SPARK-10251.
```
de027828

[SPARK-10219] [SPARKR] Fix varargsToEnv and add test case · e936cf80

Shivaram Venkataraman authored 9 years ago

cc sun-rui davies

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #8475 from shivaram/varargs-fix.

e936cf80

[SPARK-9964] [PYSPARK] [SQL] PySpark DataFrameReader accept RDD of String for JSON · ce97834d

Yanbo Liang authored 9 years ago

PySpark DataFrameReader should could accept an RDD of Strings (like the Scala version does) for JSON, rather than only taking a path.
If this PR is merged, it should be duplicated to cover the other input types (not just JSON).

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #8444 from yanboliang/spark-9964.

ce97834d

Aug 26, 2015

[SPARK-9424] [SQL] Parquet programming guide updates for 1.5 · 0fac144f
Cheng Lian authored 9 years ago
```
Author: Cheng Lian <lian@databricks.com>

Closes #8467 from liancheng/spark-9424/parquet-docs-for-1.5.
```
0fac144f

[MINOR] [SPARKR] Fix some validation problems in SparkR · 773ca037

Yu ISHIKAWA authored 9 years ago

Getting rid of some validation problems in SparkR
https://github.com/apache/spark/pull/7883

cc shivaram

```
inst/tests/test_Serde.R:26:1: style: Trailing whitespace is superfluous.

^~
inst/tests/test_Serde.R:34:1: style: Trailing whitespace is superfluous.

^~
inst/tests/test_Serde.R:37:38: style: Trailing whitespace is superfluous.
  expect_equal(class(x), "character")
                                     ^~
inst/tests/test_Serde.R:50:1: style: Trailing whitespace is superfluous.

^~
inst/tests/test_Serde.R:55:1: style: Trailing whitespace is superfluous.

^~
inst/tests/test_Serde.R:60:1: style: Trailing whitespace is superfluous.

^~
inst/tests/test_sparkSQL.R:611:1: style: Trailing whitespace is superfluous.

^~
R/DataFrame.R:664:1: style: Trailing whitespace is superfluous.

^~~~~~~~~~~~~~
R/DataFrame.R:670:55: style: Trailing whitespace is superfluous.
                df <- data.frame(row.names = 1 : nrow)
                                                      ^~~~~~~~~~~~~~~~
R/DataFrame.R:672:1: style: Trailing whitespace is superfluous.

^~~~~~~~~~~~~~
R/DataFrame.R:686:49: style: Trailing whitespace is superfluous.
                    df[[names[colIndex]]] <- vec
                                                ^~~~~~~~~~~~~~~~~~
```

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #8474 from yu-iskw/minor-fix-sparkr.

773ca037

[SPARK-10308] [SPARKR] Add %in% to the exported namespace · ad7f0f16

Shivaram Venkataraman authored 9 years ago

I also checked all the other functions defined in column.R, functions.R and DataFrame.R and everything else looked fine.

cc yu-iskw

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #8473 from shivaram/in-namespace.

ad7f0f16

[SPARK-10305] [SQL] fix create DataFrame from Python class · d41d6c48
Davies Liu authored 9 years ago
```
cc jkbradley

Author: Davies Liu <davies@databricks.com>

Closes #8470 from davies/fix_create_df.
```
d41d6c48

[SPARK-10241] [MLLIB] update since versions in mllib.recommendation · 086d4681

Xiangrui Meng authored 9 years ago

Same as #8421 but for `mllib.recommendation`.

cc srowen coderxiang

Author: Xiangrui Meng <meng@databricks.com>

Closes #8432 from mengxr/SPARK-10241.

086d4681

HOTFIX: Increase PRB timeout · de7209c2
Patrick Wendell authored 9 years ago

de7209c2

[SPARK-9665] [MLLIB] audit MLlib API annotations · 6519fd06

Xiangrui Meng authored 9 years ago

I only found `ml.NaiveBayes` missing `Experimental` annotation. This PR doesn't cover Python APIs.

cc jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #8452 from mengxr/SPARK-9665.

6519fd06

Closes #8443 · bb164052
Reynold Xin authored 9 years ago

bb164052

[SPARK-9316] [SPARKR] Add support for filtering using `[` (synonym for filter / select) · 75d4773a

felixcheung authored 9 years ago

Add support for
```
   df[df$name == "Smith", c(1,2)]
   df[df$age %in% c(19, 30), 1:2]
```

shivaram

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #8394 from felixcheung/rsubset.

75d4773a

[SPARK-10236] [MLLIB] update since versions in mllib.feature · 321d7759

Xiangrui Meng authored 9 years ago

Same as #8421 but for `mllib.feature`.

cc dbtsai

Author: Xiangrui Meng <meng@databricks.com>

Closes #8449 from mengxr/SPARK-10236.feature and squashes the following commits:

0e8d658 [Xiangrui Meng] remove unnecessary comment
ad70b03 [Xiangrui Meng] update since versions in mllib.feature

321d7759

[SPARK-10235] [MLLIB] update since versions in mllib.regression · 4657fa1f

Xiangrui Meng authored 9 years ago

Same as #8421 but for `mllib.regression`.

cc freeman-lab dbtsai

Author: Xiangrui Meng <meng@databricks.com>

Closes #8426 from mengxr/SPARK-10235 and squashes the following commits:

6cd28e4 [Xiangrui Meng] update since versions in mllib.regression

4657fa1f

[SPARK-10243] [MLLIB] update since versions in mllib.tree · fb7e12fe

Xiangrui Meng authored 9 years ago

Same as #8421 but for `mllib.tree`.

cc jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #8442 from mengxr/SPARK-10236.

fb7e12fe

[SPARK-10234] [MLLIB] update since version in mllib.clustering · d703372f

Xiangrui Meng authored 9 years ago

Same as #8421 but for `mllib.clustering`.

cc feynmanliang yu-iskw

Author: Xiangrui Meng <meng@databricks.com>

Closes #8435 from mengxr/SPARK-10234.

d703372f

[SPARK-10240] [SPARK-10242] [MLLIB] update since versions in mlilb.random and mllib.stat · c3a54843

Xiangrui Meng authored 9 years ago

The same as #8241 but for `mllib.stat` and `mllib.random`.

cc feynmanliang

Author: Xiangrui Meng <meng@databricks.com>

Closes #8439 from mengxr/SPARK-10242.

c3a54843

Aug 25, 2015

[SPARK-10238] [MLLIB] update since versions in mllib.linalg · ab431f8a

Xiangrui Meng authored 9 years ago

Same as #8421 but for `mllib.linalg`.

cc dbtsai

Author: Xiangrui Meng <meng@databricks.com>

Closes #8440 from mengxr/SPARK-10238 and squashes the following commits:

b38437e [Xiangrui Meng] update since versions in mllib.linalg

ab431f8a

[SPARK-10233] [MLLIB] update since version in mllib.evaluation · 8668ead2

Xiangrui Meng authored 9 years ago

Same as #8421 but for `mllib.evaluation`.

cc avulanov

Author: Xiangrui Meng <meng@databricks.com>

Closes #8423 from mengxr/SPARK-10233.

8668ead2

[SPARK-9888] [MLLIB] User guide for new LDA features · 125205cd

Feynman Liang authored 9 years ago

 * Adds two new sections to LDA's user guide; one for each optimizer/model
 * Documents new features added to LDA (e.g. topXXXperXXX, asymmetric priors, hyperpam optimization)
 * Cleans up a TODO and sets a default parameter in LDA code

jkbradley hhbyyh

Author: Feynman Liang <fliang@databricks.com>

Closes #8254 from feynmanliang/SPARK-9888.

125205cd

[SPARK-10215] [SQL] Fix precision of division (follow the rule in Hive) · 7467b52e

Davies Liu authored 9 years ago

Follow the rule in Hive for decimal division. see https://github.com/apache/hive/blob/ac755ebe26361a4647d53db2a28500f71697b276/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPDivide.java#L113

cc chenghao-intel

Author: Davies Liu <davies@databricks.com>

Closes #8415 from davies/decimal_div2.

7467b52e

[SPARK-10245] [SQL] Fix decimal literals with precision < scale · ec89bd84

Davies Liu authored 9 years ago

In BigDecimal or java.math.BigDecimal, the precision could be smaller than scale, for example, BigDecimal("0.001") has precision = 1 and scale = 3. But DecimalType require that the precision should be larger than scale, so we should use the maximum of precision and scale when inferring the schema from decimal literal.

Author: Davies Liu <davies@databricks.com>

Closes #8428 from davies/smaller_decimal.

ec89bd84

[SPARK-10239] [SPARK-10244] [MLLIB] update since versions in mllib.pmml and mllib.util · 00ae4be9

Xiangrui Meng authored 9 years ago

Same as #8421 but for `mllib.pmml` and `mllib.util`.

cc dbtsai

Author: Xiangrui Meng <meng@databricks.com>

Closes #8430 from mengxr/SPARK-10239 and squashes the following commits:

a189acf [Xiangrui Meng] update since versions in mllib.pmml and mllib.util

00ae4be9

[SPARK-9797] [MLLIB] [DOC] StreamingLinearRegressionWithSGD.setConvergenceTol default value · 92059078

Feynman Liang authored 9 years ago

Adds default convergence tolerance (0.001, set in `GradientDescent.convergenceTol`) to `setConvergenceTol`'s scaladoc

Author: Feynman Liang <fliang@databricks.com>

Closes #8424 from feynmanliang/SPARK-9797.

92059078

[SPARK-10237] [MLLIB] update since versions in mllib.fpm · c619c755

Xiangrui Meng authored 9 years ago

Same as #8421 but for `mllib.fpm`.

cc feynmanliang

Author: Xiangrui Meng <meng@databricks.com>

Closes #8429 from mengxr/SPARK-10237.

c619c755

[SPARK-9800] Adds docs for GradientDescent$.runMiniBatchSGD alias · c0e9ff15

Feynman Liang authored 9 years ago

* Adds doc for alias of runMIniBatchSGD documenting default value for convergeTol
* Cleans up a note in code

Author: Feynman Liang <fliang@databricks.com>

Closes #8425 from feynmanliang/SPARK-9800.

c0e9ff15

[SPARK-10048] [SPARKR] Support arbitrary nested Java array in serde. · 71a138cd

Sun Rui authored 9 years ago

This PR:
1. supports transferring arbitrary nested array from JVM to R side in SerDe;
2. based on 1, collect() implemenation is improved. Now it can support collecting data of complex types
   from a DataFrame.

Author: Sun Rui <rui.sun@intel.com>

Closes #8276 from sun-rui/SPARK-10048.

71a138cd

[SPARK-10231] [MLLIB] update @Since annotation for mllib.classification · 16a2be1a

Xiangrui Meng authored 9 years ago

Update `Since` annotation in `mllib.classification`:

1. add version to classes, objects, constructors, and public variables declared in constructors
2. correct some versions
3. remove `Since` on `toString`

MechCoder dbtsai

Author: Xiangrui Meng <meng@databricks.com>

Closes #8421 from mengxr/SPARK-10231 and squashes the following commits:

b2dce80 [Xiangrui Meng] update @Since annotation for mllib.classification

16a2be1a

[SPARK-10230] [MLLIB] Rename optimizeAlpha to optimizeDocConcentration · 881208a8

Feynman Liang authored 9 years ago

See [discussion](https://github.com/apache/spark/pull/8254#discussion_r37837770)

CC jkbradley

Author: Feynman Liang <fliang@databricks.com>

Closes #8422 from feynmanliang/SPARK-10230.

881208a8

[SPARK-8531] [ML] Update ML user guide for MinMaxScaler · b37f0cc1

Yuhao Yang authored 9 years ago

jira: https://issues.apache.org/jira/browse/SPARK-8531

Update ML user guide for MinMaxScaler

Author: Yuhao Yang <hhbyyh@gmail.com>
Author: unknown <yuhaoyan@yuhaoyan-MOBL1.ccr.corp.intel.com>

Closes #7211 from hhbyyh/minmaxdoc.

b37f0cc1

[SPARK-10198] [SQL] Turn off partition verification by default · 5c08c86b
Michael Armbrust authored 9 years ago
```
Author: Michael Armbrust <michael@databricks.com>

Closes #8404 from marmbrus/turnOffPartitionVerification.
```
5c08c86b

[SPARK-9613] [CORE] Ban use of JavaConversions and migrate all existing uses to JavaConverters · 69c9c177

Sean Owen authored 9 years ago

Replace `JavaConversions` implicits with `JavaConverters`

Most occurrences I've seen so far are necessary conversions; a few have been avoidable. None are in critical code as far as I see, yet.

Author: Sean Owen <sowen@cloudera.com>

Closes #8033 from srowen/SPARK-9613.

69c9c177