Commits · a170d34a1b309fecc76d1370063e0c4f44dc2142 · cs525-sp18-g07 / spark

Dec 17, 2015

[SPARK-12395] [SQL] fix resulting columns of outer join · a170d34a

Davies Liu authored 9 years ago

For API DataFrame.join(right, usingColumns, joinType), if the joinType is right_outer or full_outer, the resulting join columns could be wrong (will be null).

The order of columns had been changed to match that with MySQL and PostgreSQL [1].

This PR also fix the nullability of output for outer join.

[1] http://www.postgresql.org/docs/9.2/static/queries-table-expressions.html

Author: Davies Liu <davies@databricks.com>

Closes #10353 from davies/fix_join.

a170d34a

Revert "Once driver register successfully, stop it to connect to master." · cd3d937b
Davies Liu authored 9 years ago
```
This reverts commit 5a514b61.
```
cd3d937b

Once driver register successfully, stop it to connect to master. · 5a514b61

echo2mei authored 9 years ago

This commit is to resolve SPARK-12396.

Author: echo2mei <534384876@qq.com>

Closes #10354 from echoTomei/master.

5a514b61

[SPARK-12057][SQL] Prevent failure on corrupt JSON records · 9d66c421

Yin Huai authored 9 years ago

This PR makes JSON parser and schema inference handle more cases where we have unparsed records. It is based on #10043. The last commit fixes the failed test and updates the logic of schema inference.

Regarding the schema inference change, if we have something like
```
{"f1":1}
[1,2,3]
```
originally, we will get a DF without any column.
After this change, we will get a DF with columns `f1` and `_corrupt_record`. Basically, for the second row, `[1,2,3]` will be the value of `_corrupt_record`.

When merge this PR, please make sure that the author is simplyianm.

JIRA: https://issues.apache.org/jira/browse/SPARK-12057

Closes #10043

Author: Ian Macalinao <me@ian.pw>
Author: Yin Huai <yhuai@databricks.com>

Closes #10288 from yhuai/handleCorruptJson.

9d66c421

[SPARK-11904][PYSPARK] reduceByKeyAndWindow does not require checkpointing when invFunc is None · 437583f6

David Tolpin authored 9 years ago

when invFunc is None, `reduceByKeyAndWindow(func, None, winsize, slidesize)` is equivalent to

reduceByKey(func).window(winsize, slidesize).reduceByKey(winsize, slidesize)

and no checkpoint is necessary. The corresponding Scala code does exactly that, but Python code always creates a windowed stream with obligatory checkpointing. The patch fixes this.

I do not know how to unit-test this.

Author: David Tolpin <david.tolpin@gmail.com>

Closes #9888 from dtolpin/master.

437583f6

Dec 16, 2015

[SPARK-12390] Clean up unused serializer parameter in BlockManager · 97678ede

Andrew Or authored 9 years ago

No change in functionality is intended. This only changes internal API.

Author: Andrew Or <andrew@databricks.com>

Closes #10343 from andrewor14/clean-bm-serializer.

97678ede

[SPARK-12386][CORE] Fix NPE when spark.executor.port is set. · d1508dd9
Marcelo Vanzin authored 9 years ago
```
Author: Marcelo Vanzin <vanzin@cloudera.com>

Closes #10339 from vanzin/SPARK-12386.
```
d1508dd9
[SPARK-12186][WEB UI] Send the complete request URI including the query string when redirecting. · fdb38227
Rohit Agarwal authored 9 years ago
```
Author: Rohit Agarwal <rohita@qubole.com>

Closes #10180 from mindprince/SPARK-12186.
```
fdb38227

[SPARK-12365][CORE] Use ShutdownHookManager where Runtime.getRuntime.addShutdownHook() is called · f590178d

tedyu authored 9 years ago

SPARK-9886 fixed ExternalBlockStore.scala

This PR fixes the remaining references to Runtime.getRuntime.addShutdownHook()

Author: tedyu <yuzhihong@gmail.com>

Closes #10325 from ted-yu/master.

f590178d

[SPARK-10248][CORE] track exceptions in dagscheduler event loop in tests · 38d9795a

Imran Rashid authored 9 years ago

`DAGSchedulerEventLoop` normally only logs errors (so it can continue to process more events, from other jobs).  However, this is not desirable in the tests -- the tests should be able to easily detect any exception, and also shouldn't silently succeed if there is an exception.

This was suggested by mateiz on https://github.com/apache/spark/pull/7699.  It may have already turned up an issue in "zero split job".

Author: Imran Rashid <irashid@cloudera.com>

Closes #8466 from squito/SPARK-10248.

38d9795a

MAINTENANCE: Automated closing of pull requests. · ce5fd400

Andrew Or authored 9 years ago

This commit exists to close the following pull requests on Github:

Closes #1217 (requested by ankurdave, srowen)
Closes #4650 (requested by andrewor14)
Closes #5307 (requested by vanzin)
Closes #5664 (requested by andrewor14)
Closes #5713 (requested by marmbrus)
Closes #5722 (requested by andrewor14)
Closes #6685 (requested by srowen)
Closes #7074 (requested by srowen)
Closes #7119 (requested by andrewor14)
Closes #7997 (requested by jkbradley)
Closes #8292 (requested by srowen)
Closes #8975 (requested by andrewor14, vanzin)
Closes #8980 (requested by andrewor14, davies)

ce5fd400

[MINOR] Add missing interpolation in NettyRPCEnv · 861549ac

Andrew Or authored 9 years ago

```
Exception in thread "main" org.apache.spark.rpc.RpcTimeoutException:
Cannot receive any reply in ${timeout.duration}. This timeout is controlled by spark.rpc.askTimeout
	at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48)
	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63)
	at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33)
```

Author: Andrew Or <andrew@databricks.com>

Closes #10334 from andrewor14/rpc-typo.

861549ac

[SPARK-12380] [PYSPARK] use SQLContext.getOrCreate in mllib · 27b98e99

Davies Liu authored 9 years ago

MLlib should use SQLContext.getOrCreate() instead of creating new SQLContext.

Author: Davies Liu <davies@databricks.com>

Closes #10338 from davies/create_context.

27b98e99

[SPARK-9690][ML][PYTHON] pyspark CrossValidator random seed · 3a44aebd

Martin Menestret authored 9 years ago

Extend CrossValidator with HasSeed in PySpark.

This PR replaces [https://github.com/apache/spark/pull/7997]

CC: yanboliang thunterdb mmenestret  Would one of you mind taking a look?  Thanks!

Author: Joseph K. Bradley <joseph@databricks.com>
Author: Martin MENESTRET <mmenestret@ippon.fr>

Closes #10268 from jkbradley/pyspark-cv-seed.

3a44aebd

[SPARK-11677][SQL] ORC filter tests all pass if filters are actually not pushed down. · 9657ee87

hyukjinkwon authored 9 years ago

Currently ORC filters are not tested properly. All the tests pass even if the filters are not pushed down or disabled. In this PR, I add some logics for this.
Since ORC does not filter record by record fully, this checks the count of the result and if it contains the expected values.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #9687 from HyukjinKwon/SPARK-11677.

9657ee87

[SPARK-12164][SQL] Decode the encoded values and then display · edf65cd9

gatorsmile authored 9 years ago

Based on the suggestions from marmbrus cloud-fan in https://github.com/apache/spark/pull/10165 , this PR is to print the decoded values(user objects) in `Dataset.show`
```scala
    implicit val kryoEncoder = Encoders.kryo[KryoClassData]
    val ds = Seq(KryoClassData("a", 1), KryoClassData("b", 2), KryoClassData("c", 3)).toDS()
    ds.show(20, false);
```
The current output is like
```
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|value                                                                                                                                                                                 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 97, 2]|
|[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 98, 4]|
|[1, 0, 111, 114, 103, 46, 97, 112, 97, 99, 104, 101, 46, 115, 112, 97, 114, 107, 46, 115, 113, 108, 46, 75, 114, 121, 111, 67, 108, 97, 115, 115, 68, 97, 116, -31, 1, 1, -126, 99, 6]|
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
```
After the fix, it will be like the below if and only if the users override the `toString` function in the class `KryoClassData`
```scala
override def toString: String = s"KryoClassData($a, $b)"
```
```
+-------------------+
|value              |
+-------------------+
|KryoClassData(a, 1)|
|KryoClassData(b, 2)|
|KryoClassData(c, 3)|
+-------------------+
```

If users do not override the `toString` function, the results will be like
```
+---------------------------------------+
|value                                  |
+---------------------------------------+
|org.apache.spark.sql.KryoClassData68ef|
|org.apache.spark.sql.KryoClassData6915|
|org.apache.spark.sql.KryoClassData693b|
+---------------------------------------+
```

Question: Should we add another optional parameter in the function `show`? It will decide if the function `show` will display the hex values or the object values?

Author: gatorsmile <gatorsmile@gmail.com>

Closes #10215 from gatorsmile/showDecodedValue.

edf65cd9

[SPARK-12320][SQL] throw exception if the number of fields does not line up for Tuple encoder · a783a8ed
Wenchen Fan authored 9 years ago
```
Author: Wenchen Fan <wenchen@databricks.com>

Closes #10293 from cloud-fan/err-msg.
```
a783a8ed

[SPARK-12364][ML][SPARKR] Add ML example for SparkR · 1a8b2a17

Yanbo Liang authored 9 years ago

We have DataFrame example for SparkR, we also need to add ML example under ```examples/src/main/r```.

cc mengxr jkbradley shivaram

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10324 from yanboliang/spark-12364.

1a8b2a17

[SPARK-11608][MLLIB][DOC] Added migration guide for MLlib 1.6 · 8148cc7a

Joseph K. Bradley authored 9 years ago

No known breaking changes, but some deprecations and changes of behavior.

CC: mengxr

Author: Joseph K. Bradley <joseph@databricks.com>

Closes #10235 from jkbradley/mllib-guide-update-1.6.

8148cc7a

[SPARK-12361][PYSPARK][TESTS] Should set PYSPARK_DRIVER_PYTHON before Python tests · 6a880afa

Jeff Zhang authored 9 years ago

Although this patch still doesn't solve the issue why the return code is 0 (see JIRA description), it resolves the issue of python version mismatch.

Author: Jeff Zhang <zjffdu@apache.org>

Closes #10322 from zjffdu/SPARK-12361.

6a880afa

[SPARK-12309][ML] Use sqlContext from MLlibTestSparkContext for spark.ml test suites · d252b2d5

Yanbo Liang authored 9 years ago

Use ```sqlContext``` from ```MLlibTestSparkContext``` rather than creating new one for spark.ml test suites. I have checked thoroughly and found there are four test cases need to update.

cc mengxr jkbradley

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10279 from yanboliang/spark-12309.

d252b2d5

[SPARK-9694][ML] Add random seed Param to Scala CrossValidator · 860dc7f2

Yanbo Liang authored 9 years ago

Add random seed Param to Scala CrossValidator

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #9108 from yanboliang/spark-9694.

860dc7f2

[SPARK-6518][MLLIB][EXAMPLE][DOC] Add example code and user guide for bisecting k-means · 7b6dc29d

Yu ISHIKAWA authored 9 years ago

This PR includes only an example code in order to finish it quickly.
I'll send another PR for the docs soon.

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #9952 from yu-iskw/SPARK-6518.

7b6dc29d

[SPARK-12345][MESOS] Filter SPARK_HOME when submitting Spark jobs with Mesos cluster mode. · ad8c1f0b

Timothy Chen authored 9 years ago

SPARK_HOME is now causing problem with Mesos cluster mode since spark-submit script has been changed recently to take precendence when running spark-class scripts to look in SPARK_HOME if it's defined.

We should skip passing SPARK_HOME from the Spark client in cluster mode with Mesos, since Mesos shouldn't use this configuration but should use spark.executor.home instead.

Author: Timothy Chen <tnachen@gmail.com>

Closes #10332 from tnachen/scheduler_ui.

ad8c1f0b

[SPARK-12215][ML][DOC] User guide section for KMeans in spark.ml · 26d70bd2
Yu ISHIKAWA authored 9 years ago
```
cc jkbradley

Author: Yu ISHIKAWA <yuu.ishikawa@gmail.com>

Closes #10244 from yu-iskw/SPARK-12215.
```
26d70bd2

[SPARK-12310][SPARKR] Add write.json and write.parquet for SparkR · 22f6cd86

Yanbo Liang authored 9 years ago

Add ```write.json``` and ```write.parquet``` for SparkR, and deprecated ```saveAsParquetFile```.

Author: Yanbo Liang <ybliang8@gmail.com>

Closes #10281 from yanboliang/spark-12310.

22f6cd86

[SPARK-12318][SPARKR] Save mode in SparkR should be error by default · 2eb5af5f
Jeff Zhang authored 9 years ago
```
shivaram  Please help review.

Author: Jeff Zhang <zjffdu@apache.org>

Closes #10290 from zjffdu/SPARK-12318.
```
2eb5af5f

[SPARK-8745] [SQL] remove GenerateProjection · 54c512ba

Davies Liu authored 9 years ago

cc rxin

Author: Davies Liu <davies@databricks.com>

Closes #10316 from davies/remove_generate_projection.

54c512ba

[SPARK-12324][MLLIB][DOC] Fixes the sidebar in the ML documentation · a6325fc4

Timothy Hunter authored 9 years ago

This fixes the sidebar, using a pure CSS mechanism to hide it when the browser's viewport is too narrow.
Credit goes to the original author Titan-C (mentioned in the NOTICE).

Note that I am not a CSS expert, so I can only address comments up to some extent.

Default view:
<img width="936" alt="screen shot 2015-12-14 at 12 46 39 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793597/6d1d6eda-a261-11e5-836b-6eb2054e9054.png">

When collapsed manually by the user:
<img width="1004" alt="screen shot 2015-12-14 at 12 54 02 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793669/c991989e-a261-11e5-8bf6-aecf3bdb6319.png">

Disappears when column is too narrow:
<img width="697" alt="screen shot 2015-12-14 at 12 47 22 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793607/7754dbcc-a261-11e5-8b15-e0d074b0e47c.png">

Can still be opened by the user if necessary:
<img width="651" alt="screen shot 2015-12-14 at 12 51 15 pm" src="https://cloud.githubusercontent.com/assets/7594753/11793612/7bf82968-a261-11e5-9cc3-e827a7a6b2b0.png">

Author: Timothy Hunter <timhunter@databricks.com>

Closes #10297 from thunterdb/12324.

a6325fc4

Revert "[SPARK-12105] [SQL] add convenient show functions" · 1a3d0cd9
Reynold Xin authored 9 years ago
```
This reverts commit 31b39101.
```
1a3d0cd9
Revert "[HOTFIX] Compile error from commit 31b39101" · 18ea11c3
Reynold Xin authored 9 years ago
```
This reverts commit 840bd2e0.
```
18ea11c3
Style fix for the previous 3 JDBC filter push down commits. · 554d840a
Reynold Xin authored 9 years ago

554d840a

[SPARK-12315][SQL] isnotnull operator not pushed down for JDBC datasource. · 2aad2d37

hyukjinkwon authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-12315
`IsNotNull` filter is not being pushed down for JDBC datasource.

It looks it is SQL standard according to [SQL-92](http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt), SQL:1999, [SQL:2003](http://www.wiscorp.com/sql_2003_standard.zip) and [SQL:201x](http://www.wiscorp.com/sql20nn.zip) and I believe most databases support this.

In this PR, I simply added the case for `IsNotNull` filter to produce a proper filter string.

Author: hyukjinkwon <gurwls223@gmail.com>

This patch had conflicts when merged, resolved by
Committer: Reynold Xin <rxin@databricks.com>

Closes #10287 from HyukjinKwon/SPARK-12315.

2aad2d37

[SPARK-12314][SQL] isnull operator not pushed down for JDBC datasource. · 7f443a68

hyukjinkwon authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-12314
`IsNull` filter is not being pushed down for JDBC datasource.

It looks it is SQL standard according to [SQL-92](http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt), SQL:1999, [SQL:2003](http://www.wiscorp.com/sql_2003_standard.zip) and [SQL:201x](http://www.wiscorp.com/sql20nn.zip) and I believe most databases support this.

In this PR, I simply added the case for `IsNull` filter to produce a proper filter string.

Author: hyukjinkwon <gurwls223@gmail.com>

This patch had conflicts when merged, resolved by
Committer: Reynold Xin <rxin@databricks.com>

Closes #10286 from HyukjinKwon/SPARK-12314.

7f443a68

[SPARK-12249][SQL] JDBC non-equality comparison operator not pushed down. · 0f6936b5

hyukjinkwon authored 9 years ago

https://issues.apache.org/jira/browse/SPARK-12249
Currently `!=` operator is not pushed down correctly.
I simply added a case for this.

Author: hyukjinkwon <gurwls223@gmail.com>

Closes #10233 from HyukjinKwon/SPARK-12249.

0f6936b5

Dec 15, 2015

[SPARK-12304][STREAMING] Make Spark Streaming web UI display more fri… · d52bf47e

proflin authored 9 years ago

…endly Receiver graphs

Currently, the Spark Streaming web UI uses the same maxY when displays 'Input Rate Times& Histograms' and 'Per-Receiver Times& Histograms'.

This may lead to somewhat un-friendly graphs: once we have tens of Receivers or more, every 'Per-Receiver Times' line almost hits the ground.

This issue proposes to calculate a new maxY against the original one, which is shared among all the `Per-Receiver Times& Histograms' graphs.

Before:
![before-5](https://cloud.githubusercontent.com/assets/15843379/11761362/d790c356-a0fa-11e5-860e-4b834603de1d.png)

After:
![after-5](https://cloud.githubusercontent.com/assets/15843379/11761361/cfabf692-a0fa-11e5-97d0-4ad124aaca2a.png)

Author: proflin <proflin.me@gmail.com>

Closes #10318 from proflin/SPARK-12304.

d52bf47e

[SPARK-4117][YARN] Spark on Yarn handle AM being told command from RM · ca0690b5

Devaraj K authored 9 years ago

Spark on Yarn handle AM being told command from RM

When RM throws ApplicationAttemptNotFoundException for allocate
invocation, making the ApplicationMaster to finish immediately without any
retries.

Author: Devaraj K <devaraj@apache.org>

Closes #10129 from devaraj-kavali/SPARK-4117.

ca0690b5

[SPARK-10477][SQL] using DSL in ColumnPruningSuite to improve readability · a89e8b61
Wenchen Fan authored 9 years ago
```
Author: Wenchen Fan <cloud0fan@outlook.com>

Closes #8645 from cloud-fan/test.
```
a89e8b61

[SPARK-12062][CORE] Change Master to asyc rebuild UI when application completes · c5b6b398

Bryan Cutler authored 9 years ago

This change builds the event history of completed apps asynchronously so the RPC thread will not be blocked and allow new workers to register/remove if the event log history is very large and takes a long time to rebuild.

Author: Bryan Cutler <bjcutler@us.ibm.com>

Closes #10284 from BryanCutler/async-MasterUI-SPARK-12062.

c5b6b398

[SPARK-9886][CORE] Fix to use ShutdownHookManager in · 8a215d23

Naveen authored 9 years ago

ExternalBlockStore.scala

Author: Naveen <naveenminchu@gmail.com>

Closes #10313 from naveenminchu/branch-fix-SPARK-9886.

8a215d23