Commits · 9625d13d575c97bbff264f6a94838aae72c9202d · cs525-sp18-g07 / spark

Aug 27, 2015

[DOCS] [STREAMING] [KAFKA] Fix typo in exactly once semantics · 9625d13d

Moussa Taifi authored 9 years ago

Fix Typo in exactly once semantics
[Semantics of output operations] link

Author: Moussa Taifi <moutai10@gmail.com>

Closes #8468 from moutai/patch-3.

9625d13d

[SPARK-10251] [CORE] some common types are not registered for Kryo Serializat… · de027828
Ram Sriharsha authored 9 years ago
```
…ion by default

Author: Ram Sriharsha <rsriharsha@hw11853.local>

Closes #8465 from harsha2010/SPARK-10251.
```
de027828

[SPARK-10219] [SPARKR] Fix varargsToEnv and add test case · e936cf80

Shivaram Venkataraman authored 9 years ago

cc sun-rui davies

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #8475 from shivaram/varargs-fix.

e936cf80

[SPARK-9964] [PYSPARK] [SQL] PySpark DataFrameReader accept RDD of String for JSON · ce97834d

Yanbo Liang authored 9 years ago

PySpark DataFrameReader should could accept an RDD of Strings (like the Scala version does) for JSON, rather than only taking a path.
If this PR is merged, it should be duplicated to cover the other input types (not just JSON).

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #8444 from yanboliang/spark-9964.

ce97834d

Aug 26, 2015

[SPARK-9424] [SQL] Parquet programming guide updates for 1.5 · 0fac144f
Cheng Lian authored 9 years ago
```
Author: Cheng Lian <lian@databricks.com>

Closes #8467 from liancheng/spark-9424/parquet-docs-for-1.5.
```
0fac144f

[MINOR] [SPARKR] Fix some validation problems in SparkR · 773ca037

Yu ISHIKAWA authored 9 years ago

Getting rid of some validation problems in SparkR
https://github.com/apache/spark/pull/7883

cc shivaram

```
inst/tests/test_Serde.R:26:1: style: Trailing whitespace is superfluous.

^~
inst/tests/test_Serde.R:34:1: style: Trailing whitespace is superfluous.

^~
inst/tests/test_Serde.R:37:38: style: Trailing whitespace is superfluous.
  expect_equal(class(x), "character")
                                     ^~
inst/tests/test_Serde.R:50:1: style: Trailing whitespace is superfluous.

^~
inst/tests/test_Serde.R:55:1: style: Trailing whitespace is superfluous.

^~
inst/tests/test_Serde.R:60:1: style: Trailing whitespace is superfluous.

^~
inst/tests/test_sparkSQL.R:611:1: style: Trailing whitespace is superfluous.

^~
R/DataFrame.R:664:1: style: Trailing whitespace is superfluous.

^~~~~~~~~~~~~~
R/DataFrame.R:670:55: style: Trailing whitespace is superfluous.
                df <- data.frame(row.names = 1 : nrow)
                                                      ^~~~~~~~~~~~~~~~
R/DataFrame.R:672:1: style: Trailing whitespace is superfluous.

^~~~~~~~~~~~~~
R/DataFrame.R:686:49: style: Trailing whitespace is superfluous.
                    df[[names[colIndex]]] <- vec
                                                ^~~~~~~~~~~~~~~~~~
```

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #8474 from yu-iskw/minor-fix-sparkr.

773ca037

[SPARK-10308] [SPARKR] Add %in% to the exported namespace · ad7f0f16

Shivaram Venkataraman authored 9 years ago

I also checked all the other functions defined in column.R, functions.R and DataFrame.R and everything else looked fine.

cc yu-iskw

Author: Shivaram Venkataraman <shivaram@cs.berkeley.edu>

Closes #8473 from shivaram/in-namespace.

ad7f0f16

[SPARK-10305] [SQL] fix create DataFrame from Python class · d41d6c48
Davies Liu authored 9 years ago
```
cc jkbradley

Author: Davies Liu <davies@databricks.com>

Closes #8470 from davies/fix_create_df.
```
d41d6c48

[SPARK-10241] [MLLIB] update since versions in mllib.recommendation · 086d4681

Xiangrui Meng authored 9 years ago

Same as #8421 but for `mllib.recommendation`.

cc srowen coderxiang

Author: Xiangrui Meng <meng@databricks.com>

Closes #8432 from mengxr/SPARK-10241.

086d4681

HOTFIX: Increase PRB timeout · de7209c2
Patrick Wendell authored 9 years ago

de7209c2

[SPARK-9665] [MLLIB] audit MLlib API annotations · 6519fd06

Xiangrui Meng authored 9 years ago

I only found `ml.NaiveBayes` missing `Experimental` annotation. This PR doesn't cover Python APIs.

cc jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #8452 from mengxr/SPARK-9665.

6519fd06

Closes #8443 · bb164052
Reynold Xin authored 9 years ago

bb164052

[SPARK-9316] [SPARKR] Add support for filtering using `[` (synonym for filter / select) · 75d4773a

felixcheung authored 9 years ago

Add support for
```
   df[df$name == "Smith", c(1,2)]
   df[df$age %in% c(19, 30), 1:2]
```

shivaram

Author: felixcheung <felixcheung_m@hotmail.com>

Closes #8394 from felixcheung/rsubset.

75d4773a

[SPARK-10236] [MLLIB] update since versions in mllib.feature · 321d7759

Xiangrui Meng authored 9 years ago

Same as #8421 but for `mllib.feature`.

cc dbtsai

Author: Xiangrui Meng <meng@databricks.com>

Closes #8449 from mengxr/SPARK-10236.feature and squashes the following commits:

0e8d658 [Xiangrui Meng] remove unnecessary comment
ad70b03 [Xiangrui Meng] update since versions in mllib.feature

321d7759

[SPARK-10235] [MLLIB] update since versions in mllib.regression · 4657fa1f

Xiangrui Meng authored 9 years ago

Same as #8421 but for `mllib.regression`.

cc freeman-lab dbtsai

Author: Xiangrui Meng <meng@databricks.com>

Closes #8426 from mengxr/SPARK-10235 and squashes the following commits:

6cd28e4 [Xiangrui Meng] update since versions in mllib.regression

4657fa1f

[SPARK-10243] [MLLIB] update since versions in mllib.tree · fb7e12fe

Xiangrui Meng authored 9 years ago

Same as #8421 but for `mllib.tree`.

cc jkbradley

Author: Xiangrui Meng <meng@databricks.com>

Closes #8442 from mengxr/SPARK-10236.

fb7e12fe

[SPARK-10234] [MLLIB] update since version in mllib.clustering · d703372f

Xiangrui Meng authored 9 years ago

Same as #8421 but for `mllib.clustering`.

cc feynmanliang yu-iskw

Author: Xiangrui Meng <meng@databricks.com>

Closes #8435 from mengxr/SPARK-10234.

d703372f

[SPARK-10240] [SPARK-10242] [MLLIB] update since versions in mlilb.random and mllib.stat · c3a54843

Xiangrui Meng authored 9 years ago

The same as #8241 but for `mllib.stat` and `mllib.random`.

cc feynmanliang

Author: Xiangrui Meng <meng@databricks.com>

Closes #8439 from mengxr/SPARK-10242.

c3a54843

Aug 25, 2015

[SPARK-10238] [MLLIB] update since versions in mllib.linalg · ab431f8a

Xiangrui Meng authored 9 years ago

Same as #8421 but for `mllib.linalg`.

cc dbtsai

Author: Xiangrui Meng <meng@databricks.com>

Closes #8440 from mengxr/SPARK-10238 and squashes the following commits:

b38437e [Xiangrui Meng] update since versions in mllib.linalg

ab431f8a

[SPARK-10233] [MLLIB] update since version in mllib.evaluation · 8668ead2

Xiangrui Meng authored 9 years ago

Same as #8421 but for `mllib.evaluation`.

cc avulanov

Author: Xiangrui Meng <meng@databricks.com>

Closes #8423 from mengxr/SPARK-10233.

8668ead2

[SPARK-9888] [MLLIB] User guide for new LDA features · 125205cd

Feynman Liang authored 9 years ago

 * Adds two new sections to LDA's user guide; one for each optimizer/model
 * Documents new features added to LDA (e.g. topXXXperXXX, asymmetric priors, hyperpam optimization)
 * Cleans up a TODO and sets a default parameter in LDA code

jkbradley hhbyyh

Author: Feynman Liang <fliang@databricks.com>

Closes #8254 from feynmanliang/SPARK-9888.

125205cd

[SPARK-10215] [SQL] Fix precision of division (follow the rule in Hive) · 7467b52e

Davies Liu authored 9 years ago

Follow the rule in Hive for decimal division. see https://github.com/apache/hive/blob/ac755ebe26361a4647d53db2a28500f71697b276/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFOPDivide.java#L113

cc chenghao-intel

Author: Davies Liu <davies@databricks.com>

Closes #8415 from davies/decimal_div2.

7467b52e

[SPARK-10245] [SQL] Fix decimal literals with precision < scale · ec89bd84

Davies Liu authored 9 years ago

In BigDecimal or java.math.BigDecimal, the precision could be smaller than scale, for example, BigDecimal("0.001") has precision = 1 and scale = 3. But DecimalType require that the precision should be larger than scale, so we should use the maximum of precision and scale when inferring the schema from decimal literal.

Author: Davies Liu <davies@databricks.com>

Closes #8428 from davies/smaller_decimal.

ec89bd84

[SPARK-10239] [SPARK-10244] [MLLIB] update since versions in mllib.pmml and mllib.util · 00ae4be9

Xiangrui Meng authored 9 years ago

Same as #8421 but for `mllib.pmml` and `mllib.util`.

cc dbtsai

Author: Xiangrui Meng <meng@databricks.com>

Closes #8430 from mengxr/SPARK-10239 and squashes the following commits:

a189acf [Xiangrui Meng] update since versions in mllib.pmml and mllib.util

00ae4be9

[SPARK-9797] [MLLIB] [DOC] StreamingLinearRegressionWithSGD.setConvergenceTol default value · 92059078

Feynman Liang authored 9 years ago

Adds default convergence tolerance (0.001, set in `GradientDescent.convergenceTol`) to `setConvergenceTol`'s scaladoc

Author: Feynman Liang <fliang@databricks.com>

Closes #8424 from feynmanliang/SPARK-9797.

92059078

[SPARK-10237] [MLLIB] update since versions in mllib.fpm · c619c755

Xiangrui Meng authored 9 years ago

Same as #8421 but for `mllib.fpm`.

cc feynmanliang

Author: Xiangrui Meng <meng@databricks.com>

Closes #8429 from mengxr/SPARK-10237.

c619c755

[SPARK-9800] Adds docs for GradientDescent$.runMiniBatchSGD alias · c0e9ff15

Feynman Liang authored 9 years ago

* Adds doc for alias of runMIniBatchSGD documenting default value for convergeTol
* Cleans up a note in code

Author: Feynman Liang <fliang@databricks.com>

Closes #8425 from feynmanliang/SPARK-9800.

c0e9ff15

[SPARK-10048] [SPARKR] Support arbitrary nested Java array in serde. · 71a138cd

Sun Rui authored 9 years ago

This PR:
1. supports transferring arbitrary nested array from JVM to R side in SerDe;
2. based on 1, collect() implemenation is improved. Now it can support collecting data of complex types
   from a DataFrame.

Author: Sun Rui <rui.sun@intel.com>

Closes #8276 from sun-rui/SPARK-10048.

71a138cd

[SPARK-10231] [MLLIB] update @Since annotation for mllib.classification · 16a2be1a

Xiangrui Meng authored 9 years ago

Update `Since` annotation in `mllib.classification`:

1. add version to classes, objects, constructors, and public variables declared in constructors
2. correct some versions
3. remove `Since` on `toString`

MechCoder dbtsai

Author: Xiangrui Meng <meng@databricks.com>

Closes #8421 from mengxr/SPARK-10231 and squashes the following commits:

b2dce80 [Xiangrui Meng] update @Since annotation for mllib.classification

16a2be1a

[SPARK-10230] [MLLIB] Rename optimizeAlpha to optimizeDocConcentration · 881208a8

Feynman Liang authored 9 years ago

See [discussion](https://github.com/apache/spark/pull/8254#discussion_r37837770)

CC jkbradley

Author: Feynman Liang <fliang@databricks.com>

Closes #8422 from feynmanliang/SPARK-10230.

881208a8

[SPARK-8531] [ML] Update ML user guide for MinMaxScaler · b37f0cc1

Yuhao Yang authored 9 years ago

jira: https://issues.apache.org/jira/browse/SPARK-8531

Update ML user guide for MinMaxScaler

Author: Yuhao Yang <hhbyyh@gmail.com>
Author: unknown <yuhaoyan@yuhaoyan-MOBL1.ccr.corp.intel.com>

Closes #7211 from hhbyyh/minmaxdoc.

b37f0cc1

[SPARK-10198] [SQL] Turn off partition verification by default · 5c08c86b
Michael Armbrust authored 9 years ago
```
Author: Michael Armbrust <michael@databricks.com>

Closes #8404 from marmbrus/turnOffPartitionVerification.
```
5c08c86b

[SPARK-9613] [CORE] Ban use of JavaConversions and migrate all existing uses to JavaConverters · 69c9c177

Sean Owen authored 9 years ago

Replace `JavaConversions` implicits with `JavaConverters`

Most occurrences I've seen so far are necessary conversions; a few have been avoidable. None are in critical code as far as I see, yet.

Author: Sean Owen <sowen@cloudera.com>

Closes #8033 from srowen/SPARK-9613.

69c9c177

Fixed a typo in DAGScheduler. · 7f1e507b

ehnalis authored 9 years ago

Author: ehnalis <zoltan.zvara@gmail.com>

Closes #8308 from ehnalis/master.

7f1e507b

[DOC] add missing parameters in SparkContext.scala for scala doc · 5c148901
Zhang, Liye authored 9 years ago
```
Author: Zhang, Liye <liye.zhang@intel.com>

Closes #8412 from liyezhang556520/minorDoc.
```
5c148901

[SPARK-10197] [SQL] Add null check in wrapperFor (inside HiveInspectors). · 0e6368ff

Yin Huai authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-10197

Author: Yin Huai <yhuai@databricks.com>

Closes #8407 from yhuai/ORCSPARK-10197.

0e6368ff

[SPARK-10195] [SQL] Data sources Filter should not expose internal types · 7bc9a8c6

Josh Rosen authored 9 years ago

Spark SQL's data sources API exposes Catalyst's internal types through its Filter interfaces. This is a problem because types like UTF8String are not stable developer APIs and should not be exposed to third-parties.

This issue caused incompatibilities when upgrading our `spark-redshift` library to work against Spark 1.5.0. To avoid these issues in the future we should only expose public types through these Filter objects. This patch accomplishes this by using CatalystTypeConverters to add the appropriate conversions.

Author: Josh Rosen <joshrosen@databricks.com>

Closes #8403 from JoshRosen/datasources-internal-vs-external-types.

7bc9a8c6

[SPARK-10177] [SQL] fix reading Timestamp in parquet from Hive · 2f493f7e

Davies Liu authored 9 years ago

We misunderstood the Julian days and nanoseconds of the day in parquet (as TimestampType) from Hive/Impala, they are overlapped, so can't be added together directly.

In order to avoid the confusing rounding when do the converting, we use `2440588` as the Julian Day of epoch of unix timestamp (which should be 2440587.5).

Author: Davies Liu <davies@databricks.com>
Author: Cheng Lian <lian@databricks.com>

Closes #8400 from davies/timestamp_parquet.

2f493f7e

[SPARK-10210] [STREAMING] Filter out non-existent blocks before creating BlockRDD · 1fc37581

Tathagata Das authored 9 years ago

When write ahead log is not enabled, a recovered streaming driver still tries to run jobs using pre-failure block ids, and fails as the block do not exists in-memory any more (and cannot be recovered as receiver WAL is not enabled).

This occurs because the driver-side WAL of ReceivedBlockTracker is recovers that past block information, and ReceiveInputDStream creates BlockRDDs even if those blocks do not exist.

The solution in this PR is to filter out block ids that do not exist before creating the BlockRDD. In addition, it adds unit tests to verify other logic in ReceiverInputDStream.

Author: Tathagata Das <tathagata.das1565@gmail.com>

Closes #8405 from tdas/SPARK-10210.

1fc37581

[SPARK-6196] [BUILD] Remove MapR profiles in favor of hadoop-provided · 57b960bf

Sean Owen authored 9 years ago

Follow up to https://github.com/apache/spark/pull/7047

pwendell mentioned that MapR should use `hadoop-provided` now, and indeed the new build script does not produce `mapr3`/`mapr4` artifacts anymore. Hence the action seems to be to remove the profiles, which are now not used.

CC trystanleftwich

Author: Sean Owen <sowen@cloudera.com>

Closes #8338 from srowen/SPARK-6196.

57b960bf